Problem 


Solved 


Probability 


Second Edition 
430 fully solved problems 


® Concise explanations of all course concepts 


= Information on finite and countable sets, binomial coefficients, 
axioms of probability, conditional probability, and expectation of 


a finite random 


Elementary Probability and Statistics * Data Analysis 
¢ Finite Mathematics * Introductory Statistics * Discrete Mathematics 
* Introduction to Probability Theory 

Seymour Lipschutz, Ph.D. ¢ Marc Lipson, Ph.D. 


www.ebook3000.com 


SCHAUM’S 


outlines 


Probability 


Second Edition 


Seymour Lipschutz, Ph.D. 
Professor of Mathematics 
Temple University 


Marc Lipson, Ph.D. 


University of Virginia 


Schaum’s Outline Series 


a. 


New York Chicago San Francisco Lisbon London Madrid 
Mexico City Milan NewDelhi San Juan Seoul 
Singapore Sydney Toronto 


SEYMOUR LIPSCHUTZ, who is presently on the mathematics faculty at Temple University, formerly taught at the Polytechnic Institute of 
Brooklyn and was visiting professor in the Computer Science Department of Brooklyn College. He received his Ph.D. in 1960 at the Courant 
Institute of Mathematical Sciences of New York University. Some of his other books in the Schaum’s Outline Series are Beginning Linear 
Algebra, Discrete Mathematics, and Linear Algebra. 


MARC LARS LIPSON is on the faculty at the University of Virginia and formerly taught at Northeastern University, Boston University, 
and the University of Georgia. He received his Ph.D. in finance in 1994 from the University of Michigan. He is also coauthor of Schaum’s 
Outline of Discrete Mathematics with Seymour Lipschutz. 


The McGraw-Hill Companies 


Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. Except as permitted under the United States Copyright Act of 
1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, 
without the prior written permission of the publisher. 


ISBN: 978-0-07-181658-8 
MHID: 0-07-181658-5 


The material in this eBook also appears in the print version of this title: ISBN: 978-0-07-175561-0, 
MHID: 0-07-175561-6. 


All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, 
we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. 
Where such designations appear in this book, they have been printed with initial caps. 


McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training 
programs. To contact a representative please e-mail us at bulksales@mcgraw-hill.com. 


Trademarks: McGraw-Hill, the McGraw-Hill Publishing logo, Schaum’s and related trade dress are trademarks or registered trademarks of 
The McGraw-Hill Companies and/or its affiliates in the United States and other countries and may not be used without written permission. 
All other trademarks are the property of their respective owners. The McGraw-Hill Companies is not associated with any product or vendor 
mentioned in this book. 


TERMS OF USE 


This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licensors reserve all rights in and to the work. 
Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy 
of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, 
distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior consent. You may use the work for 
your own noncommercial and personal use; any other use of the work is strictly prohibited. Your right to use the work may be terminated if 
you fail to comply with these terms. 


THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS 
TO THE ACCURACY, ADEQUACY OR COMPLETENESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, 
INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, 
AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED 
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not 
warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error 
free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in 
the work or for any damages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the 
work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential 
or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibility of such 
damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or 
otherwise. 


www.ebook3000.com 


Probability theory had its beginnings in the early seventeenth century as a result of investigations 
of various games of chance. Since then many leading mathematicians and scientists made contribu- 
tions to this theory. However, despite its long and active history, probability theory was not 
axiomatized until the twentieth century. This axiomatic development, called modern probability 
theory, was then able to make the concepts of probability precise and place them on a firm 
mathematical foundation. 

This book is designed for an introductory course in probability with high school algebra as the 
main prerequisite. It can serve as a text for such a course, or as a supplement to all current 
comparable texts. The book should also prove to be useful as a supplement to texts and courses in 
statistics. Furthermore, as the book is complete and self-contained it can easily be used for 
self-study. 

This new edition includes and expands the content of the first edition. It begins with a chapter 
on sets and their operations, and then with a chapter on techniques of counting. Next comes a 
chapter on probability spaces, and then a chapter on conditional probability and independence. The 
fifth and main chapter is on random variables where we define expectation, variance, and standard 
deviation, and prove Chebyshev’s inequality and the law of large numbers. Although calculus is not 
a prerequisite, both discrete and continuous random variables are considered. We follow with a 
separate chapter on specific distributions, mainly the binomial, normal, and Poisson distribu- 
tions. Here the central limit theorem is given in the context of the normal approximation to the 
binomial distribution. The seventh and last chapter offers a thorough elementary treatment of 
Markov chains with applications. 

This new edition also has two new appendixes. The first is on descriptive statistics where 
expectation, variance, and standard deviation are again defined, but now in the context of statis- 
tics. This appendix also treats bivariate data, including scatterplots, the correlation coefficient, and 
methods of least squares. The second appendix discusses the chi-square distribution and various 
applications in the context of testing hypotheses. These two new appendixes motivate many of the 
concepts which appear in the chapters on probability, and also make the book even more useful as a 
supplement to texts and courses in statistics. 

The positive qualities that distinguished the first edition have been retained. Each chapter begins 
with clear statements of pertinent definitions, principles, and theorems together with illustrative 
and other descriptive material. This is followed by graded sets of solved and supplementary 
problems. The solved problems serve to illustrate and amplify the theory, and provide the repetition 
of basic principles so vital to effective learning. Proof of most of the theorems is included among the 
solved problems. The supplementary problems serve as a complete review of the material of each 
chapter. 

Finally, we wish to thank the staff of McGraw-Hill, especially Barbara Gilson and Maureen 
Walker, for their excellent cooperation. 


SEYMOUR LIPSCHUTZ 
Temple University 


Marc Lars Lipson 
University of Georgia 
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Set Theory 


1.1 INTRODUCTION 


This chapter treats some of the elementary ideas and concepts of set theory which are necessary 
for a modern introduction to probability theory. 


1.2 SETS AND ELEMENTS, SUBSETS 


A set may be viewed as any well-defined collection of objects, and they are called the elements or 
members of the set. We usually use capital letters, A, B, X, Y,... to denote sets, and lowercase 
letters, a, b, x, y, ... to denote elements of sets. Synonyms for set are class, collection, and family. 

The statement that an element a belongs to a set S is written 


aes 
(Here € is the symbol meaning “‘is an element of”’.) We also write 
a,bes 
when both a and b belong to S. 
Suppose every element of a set A also belongs to a set B, that is, suppose a€ A implies 
a€B. Then A is called a subset of B, or A is said to be contained in B, which is written as 


ACB or BDA 


Two sets are equal if they both have the same elements or, equivalently, if each is contained in the 
other. That is, 


A = Bif and only if AC Band BCA 


The negations of a& A, A CB, and A = B are writtena G A, A ¢ B, and A # B, respectively. 


ooo 


or slanted line ‘‘/” 


Remark 1; It is common practice in mathematics to put a vertical line 
through a symbol to indicate the opposite or negative meaning of the symbol. 
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Remark 2: The statement AC B does not exclude the possibility that A = B. In fact, for any set 
A, we have A CA since, trivially, every element in A belongs to A. However, if A C B and A # B, 
then we say that A is a proper subset of A (sometimes written A C B). 


Remark 3: Suppose every element of a set A belongs to a set B, and every element of B belongs 
toaset C. Then clearly every element of A belongs to C. In other words, if A C B and B C C, then 
ACC. 


The above remarks yield the following theorem. 
Theorem 1.1: Let A, B, C be any sets. Then: 
(i) ACA. 
(ii) If AC B and BCA, then A = B. 
(iii) If AC Band BCC, then A CC. 


Specifying Sets 
There are essentially two ways to specify a particular set. One way, if possible, is to list its 
elements. For example, 
A = {1, 3,5, 7, 9} 


means A is the set consisting of the numbers 1, 3, 5,7, and 9. Note that the elements of the set are 
separated by commas and enclosed in braces { }._ This is called the tabular form or roster method of 
a set. 

The second way, called the set-builder form or property method, is to state those properties which 
characterize the elements in the set, that is, properties held by the members of the set but not by 
nonmembers. Consider, for example, the expression 


B = {x:x is an even integer, x > 0} 

which is read: 
“B is the set of x such that x is an even integer and x > 0” 

It denotes the set B whose elements are positive even integers. A letter, usually x, is used to denote 
a typical member of the set; the colon is read as “‘such that” and the comma as “and.” 
EXAMPLE 1.1 
(a) The above set A can also be written as 

A = {x:x is an odd positive integer, x < 10} 

We cannot list all the elements of the above set B, but we frequently specify the set by writing 
B=(2,4,6,... 


where we assume everyone knows what we mean. Observe that 9€ A but 9€B. Also 6€B, but 
6EA. 


(b) Consider the sets 
A = {1, 3, 5, 7, 9}, B = {1, 2, 3, 4, 5}, C = {3, 5} 
Then CCA and CCB since 3 and 5, the elements C, are also members of A and B. On the other hand, 
AGB since7€A but 7¢B, and BZA since2€ B but 2 EA. 


(c) Suppose a die is tossed. The possible “number” or “points” which appears on the uppermost face of the 
die belongs to the set {1, 2, 3, 4, 5, 6}. Now suppose a die is tossed and an even number appears. Then 
the outcome is a member of the set {2, 4, 6} which is a (proper) subset of the set {1, 2, 3, 4, 5, 6} of all possible 
outcomes. 
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Special Symbols, Real Line R, Intervals 


Some sets occur very often in mathematics, and so we use special symbols for them. Some such 
symbols follow: 


N = the natural numbers or positive integers: 
{152253 y7s} 
Z = all integers, positive, negative, and zero: 
{ssa 2,- 15.0, 1.25225} 
R = the real numbers 


Thus we have NC ZCR. 

The set R of real numbers plays an important role in probability theory since such numbers are 
used for numerical data. We assume the reader is familiar with the graphical representation of R as 
points on a straight line, as pictured in Fig. 1-1. We refer to such a line as the real line or the real 
line R. 


—7 V5 V2 7 
-4 3 -2 -1 0 1 2 3 4 
Real Line R 
Fig. 1-1 


Important subsets of R are the intervals which are denoted and defined as follows (where a and 
b are real numbers with a < b): 

Open interval from a to b = (a,b) = {x:a<x<b} 

Closed interval from a to b = [a,b] = {x:aSxSb} 

Open-closed interval from a to b = (a,b] = {x:a<x=b} 

Closed-open interval from a to b = [a,b) = {x:aSx <b} 
The numbers a and 5 are called the endpoints of the interval. The word “open” and a parenthesis 


“(” or “)” are used to indicate that an endpoint does not belong to the interval, whereas the word 
“closed” and a bracket “‘[’” or “‘]” are used to indicate that an endpoint belongs to the interval. 


Universal Set and Empty Set 


All sets under investigation in any application of set theory are assumed to be contained in some 
large fixed set called the universal set or universe of discourse. For example, in plane geometry, the 
universal set consists of all the points in the plane; in human population studies, the universal set 
consists of all the people in the world. We will let 


U 


denote the universal set unless otherwise stated or implied. 
Given a universal set U and a property P, there may be no elements in U which have the property 
P. For example, the set 


S = {x:x is a positive integer, x* = 3} 
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has no elements since no positive integer has the required property. Such a set with no elements is 
called the empty set or null set, and is denoted by 


a) 


There is only one empty set: If S and T are both empty, then S = T since they have exactly the same 
elements, namely, none. 

The empty set © is also regarded as a subset of every other set. Accordingly, we have the 
following simple result which we state formally: 


Theorem 1.2: For any set A, we have OCA CU. 


Disjoint Sets 


Two sets A and B are said to be disjoint if they have no elements in common. Consider, for 
example, the sets 


A= {1,2}, B = (2, 4,6}, C = {4,5,6,7} 


Observe that A and B are not disjoint since each contains the element 2, and B and C are not disjoint 
since each contains the element 4, among others. On the other hand, A and C are disjoint since they 
have no element in common. We note that if A and B are disjoint, then neither is a subset of the 
other (unless one is the empty set). 


13 > VENN DIAGRAMS 


A Venn diagram is a pictorial representation of sets where sets are represented by enclosed areas 
in the plane. The universal set U is represented by the points in a rectangle, and the other sets are 
represented by disks lying within the rectangle. IfA C B, then the disk representing A will be entirely 
within the disk representing B, as in Fig. 1-2(a). If A and B are disjoint, that is, have no elements in 
common, then the disk representing A will be separated from the disk representing B, as in 


Fig. 1-2(b). 
U U U 
‘) | OC) 


(a) ACB. (b) A and B are disjoint. (c) 
Fig. 1-2 
On the other hand, if A and B are two arbitrary sets, it is possible that some elements are in A but 


not in B, some elements are in B but not in A, some are in both A and B, and some are in neither A 
nor B; hence, in general, we represent A and B as in Fig. 1-2(c). 


14 SET OPERATIONS 


This section defines a number of set operations, including the basic operations of union, 
intersection, and complement. 
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Union and Intersection 


The union of two sets A and B, denoted by A U B, is the set of all elements which belong to A or 
to B, that is, 


AUB={x:xEA or x € B} 


” 


Here, “or” is used in the sense of and/or. Figure 1-3(a) is a Venn diagram in which A UB is 
shaded. 

The intersection of two sets A and B, denoted by A B, is the set of all elements which belong 
to both A and B, that is, 


ANB={x:xEA and x € B} 
Figure 1-3(b) is a Venn diagram in which A M B is shaded. 


(a) AU Bis shaded. (b) AQ Bis shaded. 
Fig. 1-3 


Recall that sets A and B are said to be disjoint if they have no elements in common or, using the 
definition of intersection, if AM B = M, the empty set. If 
S=AUB and ANB=@ 


then S is called the disjoint union of A and B. 


EXAMPLE 1.2 
(a) Let.A = {1,2,3,4}, B = {3,4,5, 6, 7}, C = {2,3,8,9}. Then 
AUB = {1,2,3,4,5, 6, 7}, AUC = {1, 2, 3,4, 8, 9}, BUC ={3,4,5, 6,7, 8, 9}, 
ANB ={3, 4}, ANC= {2,3}, BNC={3} 


(b) Let U be the set of students at a university, and let M and F denote, respectively, the sets of male and female 
students. Then U is the disjoint union of M and F, that is, 


U=MUF and MNF=@ 
This comes from the fact that every student in U is either in M or in F, and clearly no students belong to 
both M and F, that is, M and F are disjoint. 
The following properties of the union and intersection should be noted: 


(i) Every element x in AM B belongs to both A and B; hence, x belongs to A and x belongs to 
B. Thus, AM B is a subset of A and of B, that is, 


ANBCA and ANBCB 


(ii) An element x belongs to the union A U B if x belongs to A or x belongs to B; hence, every 
element in A belongs to A U B, and every element in B belongs to AU B. That is, 


ACAUB and BCAUB 


We state the above results formally. 
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Theorem 1.3: For any sets A and B, we have 
ANBCACA UB and ANBCBCAUB 


The operations of set inclusion is closely related to the operations of union and intersection, as 
shown by the following theorem (proved in Problem 1.16). 


Theorem 1.4: The following are equivalent: A C B, ANB=A, AUB=B. 
Other conditions equivalent to A C B are given in Problem 1.55. 


Complements, Difference, Symmetric Difference 


Recall that all sets under consideration at a particular time are subsets of a fixed universal set 
U. The absolute complement or, simply, complement of a set A, denoted by A‘, is the set of elements 
which belong to U but which do not belong to A, that is, 


Ao = {x:x EU, x EA} 


Some texts denote the complement of A by A’ or A. Figure 1-4(a) is a Venn diagram in which A‘ is 
shaded. 

The relative complement of a set B with respect to a set A or, simply, the difference between A and 
B, denoted by A\B, is the set of elements which belong to A but which do not belong to B, that is, 


A\B={x:xE€A,x EB} 


The set A\B is read “A minus B”. Some texts denote A\B by A— Bor A~B. Figure 1-4(b) is 
a Venn diagram in which A \ B is shaded. 

The symmetric difference of the sets A and B, denoted by A © B, consists of those elements which 
belong to A or B, but not both. That is, 


A®B=(AUB)\(A\B) or A@®B=(A\B)U(B\A) 
Figure 1-4(c) is a Venn diagram in which A © B is shaded. 


(a) A‘ is shaded. (b) A\B is shaded. (c) A@®B is shaded. 
Fig. 1-4 


EXAMPLE 1.3 Let U = N = {1,2,3,...} be the universal set, and let 
A = {1, 2,3, 4}, B = {3,4,5, 6, 7}, C = {2,3, 8, 9}, E = {2,4,6,...} 
[Here E is the set of even positive integers.] Then 
A‘ = {5,6,7,...}, Be = {1, 2,8, 9, 10,...}, E* = {1,3,5,...} 
That is, E* is the set of odd integers. Also 
A\B = {1,2}, A\C = {1,4}, B\C= {4,5, 6, 7}, A\E = {1,3}, 
B\A = {5, 6, 7}, C\A = {8, 9}, C\B = {2,8, 9}, E\A = {6,8,10,...} 
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Furthermore 
A@B=(A\B)U (B\A) = {1, 2,5, 6, 7}, BOC = {2,4,5, 6, 7, 8, 9}, 
A®C=(A\C)U (C\A) = {1, 4, 8, 9}, ADE = {1,3,6,8,10,...} 
Algebra of Sets 


Sets under the operations of union, intersection, and complement satisfy various laws (identities) 
which are listed in Table 1-1. In fact, we formally state: 


Theorem 1.5: Sets satisfy the laws in Table 1-1. 
Table 1-1 Laws of the Algebra of Sets 


Idempotent Laws 
1b. ANA=A 


Associative Laws 
(AUB)UC=AU(BUC) 2b. (ANB)NC=AN(BNC) 


Commutative Laws 
3b. ANB=BNA 


Distributive Laws 
AU(BNC)=(AUB)N(AUC) 4b. AN(BUC)=(ANB)U(ANC) 


Identity Laws 
5b. ANU=A 
6b. AND =G 


Involution Law 
7. (AY=A 


Complement Laws 
8b. ANAS =H 
9b. Oo = 


DeMorgan’s Laws 
(AUB) = ASN BS 10b. (ANB) = ASU BS 


Remark: Each law in Table 1-1 follows from an equivalent logical law. Consider, for example, 
the proof of DeMorgan’s law: 


(A U BY = {x:x €(A or B)} = {x:x EA and x € B} = A°N BS 
Here we use the equivalent (DeMorgan’s) logical law: 
(PV gq) = 7p 74 


where 7 means “‘not’’, \/ means “or”, and /\ means “and”. (Sometimes Venn diagrams are used to 
illustrate the laws in Table 1-1 as in Problem 1.17.) 
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Duality 


The identities in Table 1-1 are arranged in pairs, as, for example, 2a and 2b. We now consider the 
principle behind this arrangement. Let EF be an equation of set algebra. The dual E* of E is the 
equation obtained by replacing each occurrence of U, N, U, Win E by N, U, W, U, respectively. For 
example, the dual of 


(UN A)U(BNA)=A_ is (Q@UA)N(BUA)=A 


Observe that the pairs of laws in Table 1-1 are duals of each other. It is a fact of set algebra, called 
the principle of duality, that, if any equation E is an identity, then its dual E* is also an identity. 


15 FINITE AND COUNTABLE SETS 


Sets can be finite or infinite. A set S is finite if S is empty or if S consists of exactly m elements 
where m is a positive integer; otherwise S is infinite. 


EXAMPLE 1.4 
(a) Let A denote the letters in the English alphabet, and let D denote the days of the week, that is, let 
A = {a, b,c, ..., y, Z} and D = {Monday, Tuesday, ..., Sunday} 


Then A and D are finite sets. Specifically, A has 26 elements and D has 7 elements. 


(b) Let R = {x:x is a river on the earth}. Although it may be difficult to count the number of rivers on the 
earth, R is still a finite set. 


(c) Let E be the set of even positive integers, and let I be the unit interval; that is, let 
E = {2,4,6,...} and I= [0,1] = {x:0Sx=1} 


Then both £ and I are infinite sets. 


Countable Sets 


A set S is countable if S is finite or if the elements of S can be arranged in the form of a sequence, 
in which case S is said to be countably infinite. A set is uncountable if itis not countable. The above 
set E of even integers is countably infinite, whereas it can be proven that the unit interval I = [0, 1] is 
uncountable. 


16 COUNTING ELEMENTS IN FINITE SETS, INCLUSION-EXCLUSION PRINCIPLE 


The notation n(S) or |S| will denote the number of elements in a set S.. Thus n(A) = 26 where 
A consists of the letters in the English alphabet, and n(D) =7 where D consists of the days of the 
week. Also n(@) = 0, since the empty set has no elements. 

The following lemma applies. 


Lemma 1.6: Suppose A and B are finite disjoint sets. Then A U B is finite and 
n(A U B) = n(A) + n(B) 
This lemma may be restated as follows: 
Lemma 1.6: Suppose S is the disjoint union of finite sets A and B. Then S is finite and 


n(S) = n(A) + n(B) 
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Proof: In counting the elements of AUB, first count the elements of A. There are n(A) of 
these. The only other elements in A U B are those that are in B but notin A. Since A and 
B are disjoint, no element of Bisin A. Thus, there are n(B) elements which are in B but not 
in A. Accordingly, n(A U B) = n(A) + n(B). 


For any sets A and B, the set A is the disjoint union of A\ B and AN B (Problem 1.45). Thus, 
Lemma 1.6 gives us the following useful result. 
Corollary 1.7; Let A and B be finite sets. Then 
n(A\ B) = n(A) -n(AN B) 


That is, the number of elements in A but not in B is the number of elements in_A minus the number 
of elements in both A and B. For example, suppose an art class A has 20 students and 8 of the 
students are also taking a biology class B. Then there are 


20-8 = 12 
students in the class A which are not in the class B. 
Given any set A, we note that the universal set U is the disjoint union of A and A°®. Accordingly, 
Lemma 1.6 also gives us the following result. 
Corollary 1.8: Suppose A is a subset of a finite universal set U. Then 
n(A‘) = n(U) — n(A) 
For example, suppose a class U of 30 students has 18 full-time students. Then there are 
30 — 18 = 12 


part-time students in the class. 


Inclusion-Exclusion Principle 
There is also a formula for n(A U B), even when they are not disjoint, called the inclusion- 
exclusion principle. Namely, 
Theorem (Inclusion-Exclusion Principle) 1.9; Suppose A and B are finite sets. Then AM B and 
AU B are finite and 
n(A U B) = n(A) + n(B) — n(ANB) 


That is, we find the number of elements in A or B (or both) by first adding n(A) and n(B) 
(inclusion) and then subtracting n(A M B) (exclusion) since its elements were counted twice. 
We can apply this result to get a similar result for three sets. 


Corollary 1.10: Suppose A, B, C are finite sets. Then A U B U Cis finite and 
n(A U BUC) =n(A) + n(B) + n(C) - n(ANB)-n(ANC)-n(BNC)+n(AN BNC) 


Mathematical induction (Section 1.9) may be used to further generalize this result to any finite 
number of finite sets. 


EXAMPLE 1.5 Suppose list A contains the 30 students in a mathematics class and list B contains the 35 students 
in an English class, and suppose there are 20 names on both lists. Find the number of students: 


(a) Only on list A (c) On list A or B (or both) 
(b) Only on list B (d) On exactly one of the two lists 
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(a) List A contains 30 names and 20 of them are on list B; hence 30 — 20 = 10 names are only on list A. That 
is, by Corollary 1.7, 


n(A\B) = n(A) — n(AN B) = 30 — 20 = 10 
(b) Similarly, there are 35 — 20 = 15 names only on list B. That is, 
n(B\ A) = n(B) — n(AN B) = 35 — 20 = 15 


(c) Weseek n(A UB). Note we are given that n(A Q B) = 20. 
One way is to use the fact that A U B is the disjoint union of A\B, AN B, and B\A (Problem 1.54), 
which is pictured in Fig. 1-5 where we have also inserted the number of elements in each of the three sets 
A\B,AMB,B\A. Thus 


n(A U B) = 10+ 20 + 15 = 45 
Alternately, by Theorem 1.8, 
n(A U B) = n(A) + n(B) — n(A NB) = 30 + 35 — 20 = 45 
In other words, we combine the two lists and then cross out the 20 names which appear twice. 


(d) By (a) and (b), there are 10 + 15 = 25 names on exactly one of the two lists; son(A ® B) = 25. Alternately, 
by the Venn diagram in Fig. 1-5, there are 10 elements in A\B, and 15 elements in B\ A; hence 


n(A @ B) = 10 + 15 = 25 


A\B [ANB 


10 20 


A U B is shaded. 


Fig. 1-5 


1.7 PRODUCT SETS 


Consider two arbitrary sets A and B. The set of all ordered pairs (a,b) where a€ A andbeE B 
is called the product, or Cartesian product, of A and B.A short designation of this product is A X B, 
which is read ‘“‘A cross B”. By definition, 


AX B={(a,b):a€ A, be B} 


One frequently writes A’ instead of A X A. 
We note that ordered pairs (a,b) and (c,d) are equal if and only if their first elements, a and c, 
are equal and their second elements, b and d, are equal. That is, 


(a, b) = (c,d) ifand onlyif a=candb=d 


EXAMPLE 1.6 R denotes the set of real numbers, and so R* = R X R is the set of ordered pairs of real numbers. 
The reader is familiar with the geometrical representation of R’ as points in the plane, as in Fig. 1-6. Here each 
point P represents an ordered pair (a, b) of real numbers, and vice versa; the vertical line through P meets the x 
axis at a, and the horizontal line through P meets the y axis at b. R?’ is frequently called the Cartesian plane. 
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Cartesian Plane R? 


Fig. 1-6 


EXAMPLE 1.7 Let A = {1,2} and B = {a,b,c}. Then 
Ax B= {(1,a), (1,6), (1,0), (2,4), (2, b), (2, c)} 
BX A= {(a,1), (a,2), (b, 1), (b, 2), (c, 1), (c, 2)} 
Also, AXA = {(1,1), (1,2), (2, 1), (2, 2)} 
There are two things worth noting in the above Example 1.7. First of all, AX B#BXA. The 
Cartesian product deals with ordered pairs, so naturally the order in which the sets are considered is 


important. 
Secondly, using n(S) for the number of elements in a set S, we have: 


n(A X B) = 6 =2-3 =n(A)-n(B) 
In fact, n(A X B) = n(A)-n(B) for any finite sets A and B. This follows from the observation that, 
for each a € A, there will be n(B) ordered pairs in A X B beginning with a. Hence, altogether there 


will be n(A) times n(B) ordered pairs in A X B. 
We state the above result formally. 


Theorem 1.11: Suppose A and B are finite. Then A X B is finite and 
n(A X B) = n(A)-n(B) 


The concept of a product of sets can be extended to any finite number of sets in a natural 
way. That is, for any sets Aj, Ao,..., Am, the set of all ordered m-tuples (a1, az, ..., dm), Where a, © Aj, 
ay © Ag, .. 5 Gn E Am, iS called the product of the sets A,, Az, ..., A,, and is denoted by 


A,X A,X++°XA, or TA, 
i=l 


Just as we write A” instead of A X A, so we write A” for A X A X--: A, where there are m 
factors. 
Furthermore, for finite sets A,, A>, ..., A,,, we have 


n(Ay X Az X +++ X Am) = n(A1)n(A2) +++ (An) 


That is, Theorem 1.11 may be easily extended, by induction, to the product of m sets. 
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18 CLASSES OF SETS, POWER SETS, PARTITIONS 


Given a set S, we may wish to talk about some of its subsets. Thus, we would be considering a 
“set of sets”. Whenever such a situation arises, to avoid confusion, we will speak of a class of sets or 
a collection of sets. The words “subclass” and “subcollection” have meanings analogous to subset. 


EXAMPLE 1.8 Suppose S = {1,2,3,4}. Let ./ be the class of subsets of S which contains exactly three elements 
of S. Then 
W = [{1, 2, 3}, {1, 2, 4}, (1,3, 4}, (2,3, 4}] 


The elements of .°/ are the sets {1, 2, 3}, {1,2, 4}, {1,3, 4}, and {2, 3, 4}. 
Let .F be the class of subsets of S which contains the numeral 2 and two other elements of S$. Then 


B = ([{1, 2, 3}, {1, 2, 4}, {2, 3, 4}] 
The elements of F are {1, 2, 3}, {1,2, 4}, and {2,3, 4}. Thus # is a subclass of 7. (To avoid confusion, we will 
usually enclose the sets of a class in brackets instead of braces.) 
Power Sets 


For a given set S, we may consider the class of all subsets of S. This class is called the power set 
of S, and it will be denoted by A(S). If S is finite, then so is A(S). In fact, the number of elements 
in Y(S) is 2 raised to the power of S; that is, 


n( PAS) = 2" 
(For this reason, the power set of S is sometimes denoted by 2°.) We emphasize that S and the empty 
set © belong to A(S) since they are subsets of S. 
EXAMPLE 1.9 Suppose S = {1,2,3}. Then 
P(S) a [D, {1}, {2}, {3}, {1, 2}, (1, 3}, (2, 3}, S] 


As expected from the above remark, A(S) has 2? = 8 elements. 


Partitions 


Let S be a nonempty set. A partition of S is a subdivision of S into nonoverlapping, nonempty 
subsets. Precisely, a partition of S is a collection {A,} of nonempty subsets of S such that 


(i) Each a in S belongs to one of the A,. 
(ii) The sets of {A,} are mutually disjoint; that is, if 
A; # Aj, the A; M A; = ©. 


The subsets in a partition are called cells. Figure 1-7 is a Venn diagram of a partition of the 
rectangular set S of points into five cells, A,;, Az, A3, Ag, As. 


Fig. 1-7 
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EXAMPLE 1.10 Consider the following collections of subsets of S = {1,2,3, ..., 8, 9}: 


(i) [{1, 3,5}, (2, 6}, 14,8, 99] 
(ii) [{1,3, 5}, (2,4, 6, 8}, (5,7, 9] 
(iii) [{1, 3, 5}, (2,4, 6, 8}, {7, 9] 


Then (i) is not a partition of S since 7 in S does not belong to any of the subsets. Furthermore, (ii) is not a 
partition of S since {1, 3,5} and {5,7, 9} are not disjoint. On the other hand, (iii) is a partition of S. 


Indexed Classes of Sets 
An indexed class of sets, usually presented in the form 
{A,; iE I or simply {Aj} 


means that there is a set A; assigned to each elementi€ J. The set / is called the indexing set and the 
sets A; are said to be indexed by J. The union of the sets A;, written U;<,A; or simply U;A;, consists 
of those elements which belong to at least one of the A,; and the intersection of the sets A;, written 
NierA; or simply M;A;, consists of those elements which belong to every Aj. 

When the indexing set is the set N of positive integers, the indexed class {A,, Ao, ...} is called a 
sequence of sets. In such a case, we also write 


UZ_,A; = Ay UA,U>::: and Uf_, A; = Ay NAZN::: 


for the union and intersection, respectively, of a sequence of sets. 


Definition: A nonempty class .Y of subsets of U is called an algebra (o-algebra) of sets if it has the 
following two properties: 


(i) The complement of any set in .7 belongs to «7. 


(ii) The union of any finite (countable) number of sets in .Y belongs to .7. 
That is, 7 is closed under complements and finite (countable) unions. 


It is simple to show (Problem 1.40) that any algebra (o-algebra) of sets contains U and @ and is 
closed under finite (countable) intersections. 


19 MATHEMATICAL INDUCTION 


An essential property of the set N = {1, 2,3, ...} of positive integers which is used in many proofs 
follows: 


Principle of Mathematical Induction I: Let A(v) be an assertion about the set N of positive integers, 
that is, A(m) is true or false for each integer n =1. Suppose A(n) has the following two properties: 


(i) A(1) is true. 
(ii) A(n + 1) is true whenever A(n) is true. 
Then A(n) is true for every positive integer. 


We shall not prove this principle. In fact, this principle is usually given as one of the axioms when 
N is developed axiomatically. 
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EXAMPLE 1.11 Let A(n) be the assertion that the sum of the first 1 odd numbers is n°; that is, 
A(n):14+34+5+4-:-+(Qn-1)=n 


(The nth odd number is 2” — 1 and the next odd number is 27 + 1.) 
Observe that A(n) is true for n = 1 since 


A(1):1= 2 
Assuming A(n) is true, we add 2 + 1 to both sides of A(n), obtaining 
14+34+5+4+-+-+(2n-1)+ (2n4+1) =n? + (Qn+1)=(n+1) 


However, this is A(n +1). That is, A(m + 1) is true assuming A(n) is true. By the principle of mathematical 
induction, A(n) is true for all n= 1. 


There is another form of the principle of mathematical induction which is sometimes more 
convenient to use. Although it appears different, it is really equivalent to the above principle of 
induction. 


Principle of Mathematical Induction II: Let A() be an assertion about the set N of positive integers 
with the following two properties: 

(i) A(1) is true. 

(ii) A(n) is true whenever A(k) is true for 1 =k <n. 

Then A(n) is true for every positive integer. 

Remark: Sometimes one wants to prove that an assertion A is true for a set of integers of 
the form 

{a,a+1,a+2,...} 


where a is any integer, possibly 0. This can be done by simply replacing 1 by a in either of the above 
Principles of Mathematical Induction. 


Solved Problems 


SETS, ELEMENTS, SUBSETS 
1.1. List the elements of the following sets; here N = {1,2,3,...}: 
(a) A={x:xEN,2<x<9}, (c) C={x:x EN, x +5 =2}, 
(b) B={x:x EN, x is even, x = 15}, (d) D={x:x EN, x is a multiple of 5} 
(a) A consists of the positive integers between 2 and 9; hence A = {3, 4,5, 6,7, 8, 9}. 
(b) B consists of the even positive integers less than or equal to 15; hence B = {2, 4, 6, 8, 10, 12, 14}. 


(c) There are no positive integers which satisfy the condition x +5 =2; hence C contains no 
elements. In other words, C = ©, the empty set. 


(d) Dis infinite, so we cannot list all its elements. However, sometimes we write D = {5, 10,15, 20,...} 
assuming everyone understands that we mean the multiples of 5. 


1.2. Which of these sets are equal: {r,s, t}, {t,s,r}, {s, r,t}, {t,7r, 5}? 


They are all equal. Order does not change a set. 
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1.3. 


1.4. 


1.5. 


1.6. 


1.7. 


1.8. 


Describe in words how you would prove each of the following: 


(a) A is equal to B. (c) A is a proper subset of B. 
(b) A isa subset of B. (d) A is not a subset of B. 


(a) Show that each element of A also belongs to B, and then show that each element of B also belongs 
to A. 


(b) Show that each element of A also belongs to B. 


(c) Show that each element of A also belongs to B, and then show that at least one element of B is not 
in A. (Note that it is not necessary to show that more than one element of B is not in A.) 


(d) Show that one element of A is not in B. 


Show that A = {2,3,4,5} is not a subset of B = {x:x EN, x is even}. 


It is necessary to show that at least one element in A does not belong to B. Now3€A, but3 €B 
since B only consists of even integers. Hence A is not a subset of B. 


Show that A = {3, 4,5, 6} is a proper subset of C = {1,2,3,..., 8, 9}. 


Each element of A belongs to C; hence AC C. On the other hand, 1 € C but 1 € A; hence A # C. 
Therefore, A is a proper subset of C. 


Consider the following sets where U = {1,2,3,..., 8, 9}: 
©,A=({1}, B= {1,3}, C={1,5,9}, D= {1,2,3,4,5}, E = {1,3,5,7, 9} 

Insert the correct symbol C or ¢ between each pair of sets: 

(a) OA (ec) BC () CD (g) DE 

(b) A,B (d) BE (f) GE (h) DU 

(a) OCA since © is a subset of every set. 

(b) ACB since 1 is the only element of A and it belongs to B. 

(c) BECsince3 EB but3 EC. 

(d) BCE since the elements of B also belong to E. 

(ec) CED since 9 EC but 9 ED. 

(f) CCE since the elements of C also belong to E. 

(g) DG Esince2€D but2€E. 

(h) DCUsince the elements of D also belong to U. 


Determine which of the following sets are equal: Z, {0}, {@}. 


Each is different from the other. The set {0} contains one element, the number zero. The set @ 
contains no element; it is the empty set. The set {@} also contains one element, the null set. 


A pair of dice are tossed and the sum of the faces are recorded. Find the smallest set S which 
includes all possible outcomes. 


The faces of the die are the numbers | to 6. Thus, no sum can be less than 2 nor greater than 12. 
Also, every number between 2 and 12 could occur. Thus 


S = {2,3,4,5, 6,7, 8, 9, 10, 11, 12} 
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SET OPERATIONS 


1.9. 


1.10. 


1.11. 


1.12. 


Let U = {1,2,...,9} be the universal set, and let 
A = {1, 2, 3, 4, 5}, C = (4,5, 6, 7, 8, 9}, E = {2, 4, 6, 8}, 


B= {4,5, 6, 7}, D = {1,3,5,7, 9}, F = (1,5, 9} 
Find: 
(a) AUBand ANB (c) AUCand ANC (ec) EVEand ENE 


(b) BUDand BND (d) DUEandDNE (f) DUFand DNF 


Recall that the union XU Y consists of those elements in either X or in Y (or both), and the 
intersection XM Y consists of those elements in both X and Y. 


(a) AUB = {1,2,3,4,5,6, 7}, ANB = {4,5} 

(b) BUD ={1,3,4,5,6,7, 9}, BO D={5,7} 

(c) AUC = (1,2,3,4,5,6, 7,8, 9} = U, ANC= {4,5} 

(d) DUE={1,2,3,4,5,6,7,8, 9} = U, DNE=@ 

(ec) EUE =(2,4,6,8}=E, ENE=(2,4,6,8}=E 
(f) DUF= ({1,3,5,7,9} =D, DNF=(1,5,)J=F 


(Observe that F C D; hence, by Theorem 1.4, we must have D U F = D and DN F= F.) 


Consider the sets in the preceding Problem 1.9. Find: 
(a) A‘, BY, D’, E*; (b) A\B, B\A, D\E, F\D; (c) ADB, COD, EOF. 


(a) The complement X° consists of those elements in the universal set U which do not belong to 
X. Hence: 


A‘ = {6,7, 8, 9}, Be = {1, 2, 3, 8, 9}, D¢ = {2, 4, 6, 8} = E, E¢ = {1,3,5,7,9} = D 
(Note D and E are complements; that is, DU E = U and DN E=@.) 
(b) The difference X\ Y consists of the elements in X which do not belong to Y. Therefore 
A\B= {1,2,3}, B\A= {6,7}, D\E= {1,3,5,7,9}=D, F\D=@ 
(Since D and E are disjoint, we must have D\ E = D; and since F C D, we must have F\ D = ©.) 


(c) The symmetric difference X @ Y consists of the elements in X or in Y but not in both XY and Y. In 
other words, X¥ ® Y = (X\ Y) U(Y\X). Hence: 


A@® B = {1, 2,3, 6,7}, CB D = {1,3, 8,9}, ED F = (2,4,6,8,1,5,9}= EUF 
(Since E and F are disjoint, we must have E® F = EU F.) 


Show that we can have AN B= ANC without B= C. 


Let A = {1,2}, B = {2,3}, C = {2,4}. Then AN B = {2} and AN C = {2}; hence 
ANB=ANC. 
However, B # C. 


Show that we can have A U B = A UC without B = C. 


Let A= {1,2}, B= {1,3}, C= {2,3}. Then AUB=({1,2,3} and AUC = ({I1,2,3}; hence 
AUB=AUC. 
However, B # C. 
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1.13. Prove: B\A = BMA‘. Thus, the set operation of difference can be written in terms of the 
operations of intersection and complement. 


1.14. 


1.15. 


1.16. 


B\A = (x:x €B,x€ A} = (x: XEB,xE€ AY = BNA 


Consider the following intervals: 


(a) 
(b) 
(a) 


(b) 


A = [-3,5), B = (3,8), C = (0, 4], D = (-7, -3] 
Rewrite each interval in set-builder form. 
Find: ANB, ANC, AND, BNC, BOD, CND. 


Recall that a parenthesis means that the endpoint does not belong to the interval, and that a bracket 
means that the endpoint does belong to the interval. Thus: 


A={x:-35x<5}, C= {x:05x=4}, 
B=(x:3<x <8}, D={x:-7<x =< —3} 

Using the short notation for intervals, we have: 
AN B= [-3,8), ANC = [-3,5), AND = {-3}, 
BNC = (0,8), BND=2@, CND=6 


Under what condition will the intersection of two intervals be an interval? 


The intersection of two intervals will always be an interval, or a singleton set {a}, or the empty set 


©. Thus, if we view 


[a, a] = {x:a Sx <a} = {a} and (a,a) = {v:a<x<a}=D 


as intervals, then the intersection of any two intervals is always an interval. 


Prove Theorem 1.4: The following are equivalent: 


(a) 
(b) 
(a) 


(b) 


ACB, ANB=A, AUB=B 


The theorem can be reduced to the following two cases: 


A CB is equivalent to AN B=A. 
A C Bis equivalent to AU B= B. 


Suppose AC B and letx€& A. Thenx€B,andsox€& ANB. Thus, A CAMB. Moreover, by 
Theorem 1.3,(AMB)CA. Accordingly, AN B= A. 

On the other hand, suppose AM B=A and let x€ A. Then x © AMB; hence x EA and 
x€B. Therefore, A C B. 

Both results show that A C B is equivalent to AN B= A. 


Suppose again that A CB. LetxG@(AUB). ThenxG@AorxeE B. Ifx EA, then x € B because 
ACB. In either case x& B. Thus, AUBCB. By Theorem 1.3, BC AUB. Accordingly, 
AUB=B. 

On the other hand, suppose A UB = Bandletx€ A. Thenx € A UB, by definition of union 
of sets. However, AU B=8B;hencex€ B. Thus, ACB. 

Both results show that A C B is equivalent to AU B = B. 


Thus, all three statements, ACB, ANB=A, AUB=B, are equivalent. 


18 SET THEORY [CHAP. 1 


VENN DIAGRAMS, ALGEBRA OF SETS, DUALITY 
1.17. [Illustrate DeMorgan’s Law (A U B)° = A‘ BS (proved in Section 1.4) using Venn diagrams. 


Shade the area outside A U B in a Venn diagram of sets A and B. This is shown in Fig. 1-8(a); hence 
the shaded area represents (A U B)°. Now shade the area outside A in a Venn diagram of A and B with 
strokes in one direction (/ //), and then shade the area outside B with strokes in another direction 
(\\\). This is shown in Fig. 1-8(5); hence the cross-hatched area (area where both lines are present) 
represents the intersection of A‘ and B*, that is, ASM B®. Both (A U B) and A‘* NM B* are represented by 
the same area; thus the Venn diagrams indicate (A U B)° = A°M B®. (We emphasize that a Venn diagram 
is not a formal proof but it can indicate relationships between sets.) 


+, 
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(a) Shaded area: (A U B)* (b) Cross-hatched area: ASN B° 


Fig. 1-8 


1.18. Prove the Distributive Law: AN (BUC) = (AN B)U (ANC) [Theorem 1.5 (45)]. 
By definition of union and intersection, 


AN(BUC)= {x:xEA,x GE BUC} 
={x:xE€A,xEBorxE€A,xE€C}={ANB)U(ANC) 


Here we use the analogous logical law p/\(q\/r) =(p/\q)\/ (pr) where /\ denotes “and” and \/ 
denotes “or”. 


1.19. Describe in words: (a) (A U B)\(A NM B) and (b) (A\B) U(B\A). Then prove they are the 
same set. (Thus, either one may be used to define the symmetric difference A ® B.) 


(a) (AUB)\(ANB) consists of the elements in A or B but not in both A and B. 


(b) (A\B)U(B\A) consists of the elements in A which are not in B, or the elements in B which are not 
in A. 


Using X\ Y = XM Y* and the laws in Table 1-1, including DeMorgan’s law, we obtain: 
(AU B)\(ANB) =(AUB)N (ANB) = (AUB) N (ASN BY) 
= (ANA) U(AN BY) U (BNA) U (BN B) 
=QDU(ANB)U(BNA)UD 
= (AN BY) N(BN A = (A\B) U(B\A) 


1.20. Write the dual of each set equation: 
(a) (UNA)U(BNA)=A (c) (ANU)N)BUA) =D 
(b) (AUBUC)=(AUCYN(AU BY (d) (ANU)NA=CG 
Interchange M and U and also U and @ in each set equation: 
(a) (OUA)N(BUA)=A (c) (AUM)U(UUA) =U 
(b) (ANBNC)=(ANC)/U(AN BY (d) (AUU)/UA=U 
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FINITE SETS AND COUNTING PRINCIPLE, COUNTABLE SETS 


1.21. 


1.22. 


1.23. 


Determine which of the following sets are finite: 

(a) A = {seasons in the year} (d) D = {odd integers} 

(b) B = {states in the United States} (e) E = {positive integral divisors of 12} 
(c) C = {positive integers less than 1} (f) F = {cats living in the United States} 


(a) A is finite since there are four seasons in the year, that is, n(A) = 4. 

(b) B is finite because there are 50 states in the United States, that is, n(B) = SO. 

(c) There are no positive integers less than 1; hence C is empty. Thus, C is finite and n(C) = ©. 
(d) D is infinite. 

(e) The positive integer divisors of 12 are 1, 2,3, 4, 6,12. Hence E is finite and n(E£) = 6. 


(f) Although it may be difficult to find the number of cats living in the United States, there is still a finite 
number of them at any point in time. Hence F is finite. 


Suppose 50 science students are polled to see whether or not they have studied French (F) or 
German (G), yielding the following data: 


25 studied French, 20 studied German, 5 studied both 


Find the number of students who: (a) studied only French, (b) did not study German, (c) 
studied French or German, (d) studied neither language. 


(a) Here 25 studied French, and 5 of them also studied German; hence 25 — 5 = 20 students only studied 
French. That is, by Corollary 1.7, 


n(F\ G) = n(F) — N(FM G) = 25 —5 = 20 


(b) There are 50 students of whom 20 studied German; hence 50 — 20 = 30 did not study German. That 
is, by Corollary 1.8, 


n(G*) = n(U) — n(G) = 50 — 20 = 30 


(c) By the inclusion-exclusion principle in Theorem 1.9, 
n(F U G) = n(F) + n(G) — n(FN G) = 25 + 20 — 5 = 40 
That is, 40 students studied French or German. 


(d) The set F°G* consists of the students who studied neither language. By DeMorgan’s law, 
F° G°=(FUG)*. By (c), 40 studied at least one of the languages; hence 


n(F° N G*) = n(U) — n(FU G) = 50 — 40 = 10 


That is, 10 students studied neither language. 


Each student at some college has a mathematics requirement M (to take at least one 
mathematics course) and a science requirement S (to take at least one science course). A poll 
of 140 sophomore students shows that: 


60 completed M, 45 completed S, 20 completed both M and S$ 
Use a Venn diagram to find the number of students who had completed: 


(a) At least one of the two requirements 
(b) Exactly one of the two requirements 
(c) Neither requirement 
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Translating the above data into set notation yields: 
n(M) = 60, n(S) = 45, n(MN S) = 20, n(U) = 140 
Draw a Venn diagram of sets M and S with four regions, as in Fig. 1-9(a). Then, as in Fig. 1-9(b), assign 
numbers to the four regions as follows: 
20 completed both M and S, so n(MN S) = 20 
60 — 20 = 40 completed M but not S, so n(M\S) = 40 
45 — 20 = 25 completed S but not M, so n(S\M) = 25 
140 — 20 — 40 — 25 = 55 completed neither M nor S$ 


By the Venn diagram: 


(a) 20+ 40+ 25 = 85 completed M or S. Alternately, we can find n(M U S) without the Venn diagram 
by using the inclusion-exclusion principle: 


n(M US) = n(M) + n(S) — n(M 2 S) = 60 + 45 — 20 = 85 


(b) 40+ 25 = 65 completed exactly one of the requirements. That is, n(M@®S) = 65. 
(c) 55 completed neither requirement. That is, n(M°M S°) = 55. 


(a) (b) 
Fig. 1-9 


1.24. Prove Theorem 1.9 (Inclusion-exclusion principle): Suppose A and B are finite sets. Then 
AUBand ANB are finite and 


n(A U B) = n(A) + n(B) — n(AN B) 


Suppose A and B are finite. Then clearly AM B and A U B are finite. 
Suppose we count the elements of A and then count the elements of B. Then, every element in 
A B would be counted twice, once in A and once in B. Hence, as required, 
n(A U B) = n(A) + n(B) — n(A NB) 
Alternately (Problem 1.54): 


(i) A is the disjoint union of A\B and AN B. 
(ii) B is the disjoint union of B\A and AN B. 
(iii) A U B is the disjoint union of A\B, AN B, and B\A. 


Therefore, by Lemma 1.6 and Corollary 1.7, 


n(A U B) = n(A\B)+n(AN B)+n(B\A) 
[n(A) — n(A NN B)] + n(AN B) + [n(B) —- n(AN B)] 


= n(A) + n(B) — n(AN B) 
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1.25. Show that each set is countable: (a) Z, the set of integers, (b) N XN. 


A set S is countable if: (a) S is finite or (b) the elements of S can be listed in the form of a sequence 
or, in other words, there is a one-to-one correspondence between the positive integers (counting numbers) 


N = {1,2,3,...} and S. 
Neither set is finite. 
(a) The following shows a one-to-one correspondence between N and Z: 


Counting numbers N: 1 2 3 4 5 6 7 8 


ol, Be a 
Integers Z: 0 1 1. 223 3 4 


That is, n € N corresponds to either n/2, when n is even, or (1 — n)/2, when n is odd: 


we ie for n even, 
* la=nl2 for n odd. 


Thus Z is countable. 
(b) Figure 1-10 shows that N X N can be written as an infinite sequence as follows: 
(1,1), (2,1), (1,2), (1,3), (2, 2), .-- 


Specifically, the sequence is determined by “following the arrows” in Fig. 1-10. 


(1) (1.2) (1.3) a. ea ‘ 
(2,1) (2,2) (2,3) (2.4) ri 
| ) ee (3,3) (3.4) 
(4,1) ta (43) (4,4) 

Fig. 1-10 


ORDERED PAIRS AND PRODUCT SETS 
1.26. Find x and y given that (2x, x — 3y) = (6, —9). 


Two ordered pairs are equal if and only if the corresponding entries are equal. This leads to the 
equations 


2x =6 and x—3y=—-9 
Solving the equations yields x = 3, y = 4. 


1.27. Given: A = {1,2,3} and B = {a,b}. Find: (a) A X B, (b) BX A, (c) BX B. 
(a) AX B consists of all ordered pairs (x,y) where x € A and y€ B. Thus 
A X B= {(1,a), (1, b), (2, 4), (2, b), (3, a), (3, b)} 
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(b) BX A consists of all ordered pairs (x,y) where x € Band y€ A. Thus 
BX A= {(a,1), (a, 2), (a, 3), (b, 1), (b, 2), (b, 3)} 
(c) BX B consists of all ordered pairs (x, y) where x,y € B. Thus 
BX B= {(a,a), (a,b), (b, a), (b, b)} 


Note that, as expected from Theorem 1.11, n(A X B) = 6, n(B X A) = 6, n(B X B) = 4; that is, 
the number of elements in a product set is equal to the product of the numbers of elements in the 
factor sets. 


1.28. Given A = {1,2}, B = {x, y, z}, C= {3,4}. Find AX BXC. 


AX BX C consists of all ordered triples (a,b,c) where a€ A, bE B,cEC. These elements of 
A X B X Ccan be systematically obtained by a so-called “tree diagram” as in Fig. 1-11. The elements of 
A X B X C are precisely the 12 ordered triplets to the right of the diagram. 

Observe that n(A) = 2, n(B) = 3, n(C) = 2 and, as expected, 


n(A X BX C) = 12 = n(A)-n(B-n(C) 


es: (1, x, 3) 
eae 4 (1, x, 4) 


ow 3 (1, y,3) 


4 (1, y, 4) 


Se (1,z,3) 
Oe (1,7, 4) 


eee 3 (2, x, 3) 
erates 4 (2, x, 4) 


ee 3 (2, y,3) 


(2, y, 4) 


ee 3 (2, 2,3) 
WPL (2,2, 4) 


1.29. Each toss of a coin will yield either a head or a tail. Let C= {H,T} denote the set of 
outcomes. Find C’, n(C*), and explain what C* represents. 


Since n(C) = 2, we have n(C*) = 2? = 8. Omitting certain commas and parenthesis for notational 
convenience, 


C? = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


C? represents all possible sequences of outcomes of three tosses of the coin. 
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1.30. 


Prove: A X (BNC) =(AX B)N(AXC). 


AX(BNC)= 


(x,y):xEA, yEBNC} 

(x,y) :xEA, yEB, yEC} 
(x,y):xE€A,vyEB,xEA,yEC} 
(x,y): (x,y) EA X B, (x,y) EA X CH 
=(AXB)N(AXC) 


aaron a 


CLASSES OF SETS AND PARTITIONS 


1.31. 


1.32. 


1.33. 


1.34. 


Consider the set A = [{1, 2, 3}, {4,5}, {6, 7, 8}]. (a) Find the elements of A. (b) Find n(A). 


(a) A isa collection of sets; its elements are the sets {1, 2, 3}, {4,5}, and {6, 7, 8}. 
(b) A has only three elements; hence n(A) = 3. 


Consider the class A of sets in Problem 1.31. Determine whether or not each of the following 
is true or false: 

(a) 1EA (c) {6,7,8} EA (e) GEA 

(b) {1,2,3}CA (d) {{4,5}} CA (f) OCA 


(a) False. 1 is not one of the three elements of A. 

(b) False. {1,2,3} is not a subset of A; it is one of the elements of A. 

(c) True. {6,7,8} is one of the elements of A. 

(d) True. {{4,5}}, the set consisting of the element {4,5}, is a subset of A. 


(e) False. The empty set O is not an element of A, that is, it is not one of the three sets listed as 
elements of A. 


(f) True. The empty set @ is a subset of every set; even a class of sets. 


List the elements of the power set A(A) of A = {a, b,c, d}. 


The elements of (A) are the subsets of A. Hence: 
P(A) =[A, {a,b,c}, {a,b,d}, {a,c,d}, {b,c,d}, {a,b}, {a,c}, 
{a,d}, {b,c}, {b,d}, {c,d}, {a}, {b}, {c}, {d}, ©] 
As expected, P(A) has 2* = 16 elements. 


Let S = {a, b,c, d,e, f,g}. Determine which of the following are partitions of S: 


(a) Pi = [{a,c, e}, {b}, td, 8}] (c) Ps=[{a,b,e, g}, {c}, (4. FI] 

(b) Po=l[{a,e,g}, {c,d}, (be f}]  (d) Pa=llab.odefg}] 

(a) P, is not a partition of S since f E S does not belong to any of the cells. 

(b) P, is not a partition of S since e € S belongs to two of the cells, {a, e, g} and {b, e, f}. 
(c) P3 is a partition of S since each element in S belongs to exactly one cell. 


(d) P, is a partition of S into one cell, S itself. 
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1.35. 


1.36. 


1.37. 


1.38. 
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Find all partitions of S = {a, b,c, d}. 


Note first that each partition of S contains either one, two, three, or four distinct cells. The partitions 
are as follows: 


(1) [la, b,c, d}] = [S] 


{ 

(2a) [{a}, (b,c, 43], [{b}, (a,c, a}], [fe}. fa, 5,@}], [ld], ta, b, 

(2b) [{a, 5}, te, dj], Ela. c}, tb, d}], [la d}, {b, eH] 

(3) Ua}. {o} te, 3], Ela}, te}, tb, a}, [tal td}, (b,cH], [O}, tel, {a ay, 
[{b}. {4}, ta, c}], (lel. {a}, ta, bY 

(4) Ua}, {}, te}, {4)] 


[Note (2a) refers to partitions with one-element and three-element cells, whereas (2a) refers to partitions 
with two two-element cells.] There are 1+4+3+6+1=15 different partitions of S. 


Let N = {1,2,3,...} and, for each n EN, let 
A,, = {x:x is a multiple of n} = {n, 2n, 3n,...} 

Find: (a) A; As, (b) AyN Ag, (c) UiegA;, where QO = {2,3,5,7,11,...} is the set of prime 
numbers. 
(a) Those numbers which are multiples of both 3 and 5 are the multiples of 15; hence 

A; N As = Ais 
(b) The multiples of 12 and no other numbers belong to both A, and A¢; hence 

Ay N Ag = Ai 
(c) Every positive integer except 1 is a multiple of at least one prime number; hence 


Uieg A: = (2,3, 4,...,} = N\{I1} 


For each n EN, let B,, = (0, 1/n), the open interval from 0 to 1/n. [For example, B, = (0, 1), 
By = (0, 1/2), Bs = (0, 1/5).] Find: 

(a) B;UB, and B31 B, 

(b) UnesB, where A is a nonempty subset of N 

(c) QnenB, 


(a) Since B, is a subset of B;, we have B; U B, = B; and B;M B, = B,. 
(b) Let k be the smallest element of A. Then U,<,B, = By. 


(c) Let x be any real number. Then there is at least one kK EN such that x € (0, 1/k) = B,. Thus 
NnenB,, = ©. 


Prove: Let {A;:i € J} be an indexed collection of sets, and let ip € J. Then 


NierAi Cc Aj 


10 


= Vier Ai 


Let x © Nie7A;; then x € A; for every i€ Jl. In particular x € A;. Hence Njc;A;C Aj, Now let 


y€A,,. Since ip EJ, y © Uje;A;. Hence Aj, C UjesA;. 


to* 


to" 
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1.39. Prove (DeMorgan’s law): For any indexed collection {A;:i € 1} of sets, 
(U;A;)° = ,A; 
Using the definitions of union and intersection of indexed classes of sets, we get: 


(U;A;)° = {x: x E U;A;} = {x: x E Aj, for every i} 
= {x:x € Af, for every i} = N,AF 


1.40. Let .Y be an algebra (o-algebra) of subsets of U. Show that: 
(a) U and @ belong to .%. (b) is closed under finite (countable) intersections. 


Recall that, by definition, ..7 is nonempty and .Y is closed under complements and finite (countable) 
intersections. 


(a) Since 7 is nonempty, there is a set AG .Y%. Hence the complement A‘ belongs to .Y. Therefore, 
the union and complement, 


AUAS=U and U=6 


belong to .Y, as required. 


(b) Let {A;} be a finite (countable) collection of sets belonging to .Y. Therefore, by DeMorgan’s law 
(Problem 1.39), 


(U;AF)° = NAF = 1,4; 
Hence 1;A; belongs to .Y, as required. 


MATHEMATICAL INDUCTION 
1.41. Prove the assertion A(n) that the sum of the first n positive integers is 3n(n + 1); that is, 
A(n):1+24+34+-+++n=5n(n +1) 
The assertion holds for n = 1 since 
AQ): 1=3()0 +1) 
Assuming A(n) is true, we add n + 1 to both sides of A(n). This yields 
14+24+3+---+n+(n+1) n(n+1)+(n+1) 


[n(n + 1) + 2(n + 1)] 
[(n + 1)(n + 2)] 


NIF NIP NI 


which is A(m +1). That is, A(m + 1) is true whenever A(n) is true. By the principle of induction, A(z) 
is true for alln=1. 


1.42. Prove the following assertion (for n = 0): 
A(m):14+24+27+2?+---+27=2"71-1 
A(0) is true since 1 = 2!'—1. Assuming A(n) is true, we add 2”*' to both sides of A(n). This 


yields: 
142427427 4+---427 4271 = Qt 744 Qn 
=2(2"") = 1 
= gn+2 = 1 


which is A(m +1). Thus, A(m + 1) is true whenever A(n) is true. By the principle of induction, A(n) is 
true for all n = 0. 
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1.43. Prove: n?=2n +1 forn=3. 


Since 3? = 9 and 2(3) + 1 = 7, the formula is true for n = 3. Assuming n* = 2n + 1, we have 


(n+1)P =n? +2n+1=(2n+1)+2n+1=2n+24+2n=2n+24+1=2(n+1)4+1 


Thus, the formula is true forn +1. By induction, the formula is true for all n = 3. 


1.44. Prove: n! =2” forn=4. 


Since 4!=1-2-3-4=24 and 2*= 16, the formula is true for n=4. Assuming n!=2” and 
n+1=2, we have 


(n+ 1)! =n!(n + 1) = 2"(n + 1) = 2"(2) = 2"*1 


Thus, the formula is true form +1. By induction, the formula is true for all n = 4. 


MISCELLANEOUS PROBLEMS 
1.45. Show that A is the disjoint union of A\ B and AM B; that is, show that: 
(a) A=(A\B)U(ANB), (b) (A\B)N(ANB)=@. 
(a) By Problem 1.13, A\B=ANMB*. Using the Distributive Law and the Complement Law, we get 
(A\B) U(ANB) = (ANB) U(ANB)=AN(BUB)=ANU=A 
(b) Also, 
(A\B)N (ANB) =(ANBIN(ANB)=AN(BNB)=AND=D 


1.46. Prove Corollary 1.10. Suppose A, B, C are finite sets) Then A U BU C is finite and 
n(A U BUC) =n(A) + n(B) + n(C) - n(AN B)-n(ANC)-—n(BNC)+n(ANBNC) 
Clearly A U B U Cis finite when A, B, C are finite. Using 
(AUB)NC=(ANC)U(BNC) and (ANB)N(BNC)=ANBNC 
and using Theorem 1.9 repeatedly, we have 
n(AU BUC) =n(A UB) +n(C)-n[(ANC)U(BNC)] 


= [n(A) + n(B) — n(AN B)] + n(C) — [WAN C)+n(BNC)-n(ANBNO)] 
n(A) + n(B) + n(C) — n(AN B)-n(ANC)—-n(BNC)+nANBNC) 


as required. 


Supplementary Problems 


SETS, ELEMENTS, SUBSETS 
1.47. Which of the following sets are equal? 


= (3, lj, 


A = {x:x° — 4x +3 =O}, C= {x:x EN, x <3}, E = {1,2}, G 
F = {1, 2, 1}, = {1,1,3} 


B= {x:x?-3x+2=0}, D = {x:x EN, x is odd, x <5}, 
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1.48. 


1.49, 


List the elements of the following sets if the universal set is the English alphabet U = {a, b,c,. 


Furthermore, identify which of the sets are equal. 

A = {x:x is a vowel}, C = {x: x precedes f in the alphabet}, 
B = {x:x is a letter in the word “‘Jittle”}, D = {x:x is a letter in the word “‘title”} 
Let A = {1,2,...,8, 9}, B = {2,4, 6, 8}, C = {1,3,5,7, 9}, D = {3, 4,5}, E = {3, 5}. 
Which of the above sets can equal a set X under each of the given conditions? 


(a) X and B are disjoint (c) XCA but XZC 
(b) XCD but XCB (d) XCCbuXgA 


SET OPERATIONS 


1.50. 


1.51. 


1.52. 


1.53. 


1.54. 


1.55. 


1.56. 


Given the universal set U = {1,2,3,..., 8,9} and the sets: 
A = {1,2,5, 6}, B = (2,5,7}, C= {1,3,5,7,9} 
Find: (a) AN B and ANC, (b) AUB and A UC, (c) A and C*. 


For the sets in Problem 1.50, find: (a) A\B and A\C, (b) A®Band AGC. 
For the sets in Problem 1.50, find: (a) (A UC)\B, (b) (AUB), (c) (BOC)\A. 


Let A = {a, b, Cc, d, e}, B = {a, b, d, f, g}, C rs {b, c, e, 8, h}, D = {d, ef, 8; h}. Find: 

(a) AN(BUD) (c) (AND)UB (ec) BNCND (g) (ABONB 
(b) B\(CUD) (d) (AUD)\C (f) (C\A)\D (h) (A®D)\B 

Let A and B be any sets. Prove A U B is the disjoint union of A\B, AM B, and B\A. 


Prove the following: 

(a) ACBifand only if AN B=. (c) ACB if and only if BSC A*. 
(b) ACB ifand only if AS UB=U. (d) ACB if and only if A\B=2@. 
(Compare with Theorem 1.4.) 
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The formula A \B = A M B* defines the difference operation in terms of the operations of intersection and 
complement. Find a formula which defines the union A U B in terms of the operations of intersection 


and complement. 


VENN DIAGRAMS, ALGEBRA OF SETS, DUALITY 


1.57. 


The Venn diagram in Fig. 1-12 shows sets A, B, C. Shade the following sets: 
(a) A\(BUC) (b) ASA(BUC) (c) (AUC)N(BUC) 
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1.58. 


1.59. 
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Write the dual of each equation: 


(a) AU(ANB)=A (b) (ANB)U(ASNB)U(ANB) U (AEN BY) = U 


Use the laws in Table 1-1 to prove: (AM B) U(AN B*) =A. 


FINITE SETS AND THE COUNTING PRINCIPLE, COUNTABLE SETS 


1.60. 


1.61. 


1.62. 


1.63. 


1.64. 


1.65. 


Determine which of the following sets are finite: 


(a) Lines parallel to the x axis (c) Animals living on the earth 
(b) Letters in the English alphabet (d) Circles through the origin (0, 0) 


Given n(U) = 20, n(A) = 12, n(B) = 9, n(AN B) = 4. Find: 
(a) n(AUB) (b) n(A’) (c) n(B‘) (d) n(A\B) (e) n(@) 


Among 120 Freshmen at a college, 40 take mathematics, 50 take English, and 15 take both mathematics 
and English. Find the number of the Freshmen who: 


(a) Do not take mathematics (d) Take English but not mathematics 
(b) Take mathematics or English (ec) Take exactly one of the two subjects 
(c) Take mathematics but not English (f) Take neither mathematics nor English 


In a survey of 60 people, it was found that 25 read Newsweek magazine, 26 read Time, and 23 read 
Fortune. Also, 9 read both Newsweek and Fortune, 11 read Newsweek and Time, 8 read both Time and 
Fortune, and 3 read all three magazines. 


(a) Figure 1-13 is a Venn diagram of three sets, N (Newsweek), T (Time), and F (Fortune). Fill in the 
correct number of people in each of the eight regions of the Venn diagram. 


(b) Find the number of people who read: (i) only Newsweek, (ii) only Time, (iii) only Fortune, (iv) 
Newsweek and Time, but not Fortune, (v) only one of the magazines, (vi) none of the magazines. 


Fig. 1-13 


Let A;, A, A3,... be a sequence of finite sets. Show that the union S = U;A; is countable. 


Let A,, Az, Az, ... be a sequence of pairwise disjoint countably infinite sets. Show that the union 
T = U;,A; is countable. 


PRODUCT SETS 


1.66. 


1.67. 


Find x and y if: (a) (x + 3,3) = (5,3x + y), (b) (x — 3y,5) = (7,x — y). 


Find x, y, z if (2x,x + y,x — y — 2z) = (4, -1, 3). 
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1.68. Let A = {a,b} and B = {1,2,3,4}. Find (a) A X B and (b) BX A. 

1.69. Let C = {H, T}, the set of possible outcomes if a coin is tossed. Find: 
(a) C? = CX Cand (b) C*?'=CXCXCXC. 

1.70. Suppose n(A) = 2 and n(B) = 6. Find the number of elements in: 
(a) AX B, BX A, (b) A’, B’, A*, B8, (c) AXAXBXA. 

CLASSES OF SETS AND PARTITIONS 

1.71. Find the power set P(A) of A = {a, b,c, d, e}. 

1.72. Let S = {1,2,3,4,5,6}. Determine whether each of the following is a partition of S: 
(a) [{1,3,5}, {2,4}, (3, 6}] (4) [{1}, 13, 6} (2,4, 5}, {3, 6]] 


(b) [{1, 5}, {2}, (3, 6} (e) [{1,2,3,4,5, 6]] 
(ce) [{L, 5}, {2}, (4), (3, 6]] Cf) [{Lh {2}, {3}, (4), {5}, (6)] 


1.73. Find all partitions of S = {1, 2, 3}. 


1.74. For each positive integer n EN, let A,, = {n, 2n, 3n, ...}, the multiples of n. Find: 


(a) A,N Az, (b) AgN Ag, (c) AsU Asp, (d) As Aap, (€) A,U Ay, Where s,tEN, 
(f) As N As, where s,t EN. 


1.75. Prove: If J CN is infinite, then N(A;:i€ J) =@. (Here the A; are the sets in Problem 1.74.) 


1.76. Let [A,, Ay, ..., A,,] and [B,, B.,..., B,] be partitions of S. Show that the collection of sets 
[A;N By; i=1,....m,j=1,...,n]\O 
(where the empty set © is deleted) is also a partition of S. (It is called the cross partition.) 


1.77. | Prove: For any indexed class of sets {A;:i € J} and any set B: 
(a) BU (N;A,;) = N({BUA)), (b) BN (U;A;)) = U{BN Aj) 


1.78. Prove (DeMorgan’s law): (U;A;)° = MAF. 


1.79. Show that each of the following is an algebra of subsets of U: 
(a) 7 ={O,U}, (b) ZB ={O,A, AU}, (c) P(U), the power set of U 


1.80. Let .Y and & be algebras (o-algebras) of subsets of U. Prove that the intersection YM Z& is also an 
algebra (o-algebra) of subsets of U. 


MATHEMATICAL INDUCTION 


1.81. Prove:2+4+6+-::+2n=n(n +1). 
1.82. Prove:1+4+7+--+:+ (3n — 2) = 2n(3n — 1). 
+1)(Qn+1 
1.83. Prove: 17+ 27+ 3°+---+7r? wl x if Ee 
1.84. Prove: For n = 3, we have 2” = n’. 
1 1 1 1 1 


1.85. P : bees = : 
NSE TEBS Bee.) Bae (Qn—N(Qn+1) n+l 
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Answers to Supplementary Problems 


B=C=E=F, A=D=GFrH. 
A = {a,e,i,0,u}; B = D = {1, i,t, e}; C = {a, b,c, d, e}. 
(a) C and E; (b) D and E; (c) A, B, D; (d) None. 


(a) ANB = {2,5}, ANC={1,5}; (b) AUB = (1,2, 5,6, 7}, BUC = {1,2, 3, 5,7, 9}; 
(c) A = {3, 4,7, 8, 9}, C° = {2, 4, 6, 8}. 


(a) A\B = {1,6}, A\C = {2,6}; (b) A@B = {1, 6,7}, ABC = (2,3, 6,7, 9}. 

(a) (AU C)\B = {1,3, 6, 9}; (b) {A U BY’ = (3, 4,8, 9}: (c) [BO C)\A = 83, 9. 

(a) AN (BUD) = {a, b, d, e}; (b) B\(C U D) = {a}; (c) (AN D) UB = {a, b, d, e, f, g}; 

(d) (AUD)\ C= {a,d,f}; (€-) BA CAD = gh: (f) (C\A)\D =@; (8) (ADONB = fa, dg): 
(h) (A @® D)\ B= {c,h}. 


AUB=(A°N BY. 


See Fig. 1-14. 


(ele 


1.58. 


Fig. : - 


(a) AN (AUB) =A; (b) (AUB) N (ASU B) N (AU BN (ASU BY) = @ 

(a) Infinite; (b) finite; (c) finite; (d) infinite. 

(a) n(A UB) = 17; (b) n(A°) = 8; (c) n(B°) = 11; (d) n(A\B) = 8; (e) n(@) = 0. 
(a) 80; (b) 75; (c) 25; (d) 35; (e) 60; (f) 45. 


(a) See Fig. 1-15; (b) (i) 8, (ii) 10, (iti) 9, (iv) 8, (v) 27, (vi) 11. 


Fig. 1-15 
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1.64, 


1.65. 


1.66. 


1.67. 


1.68. 


1.69. 


1.70. 


1.71. 


1.72. 


1.73. 


1.74. 


1.75. 


Let B, = A,, B, = A,\B,, Bz = A3\ Bo, ..., that is, B, = A,\B,-;. Then the B, are finite and pairwise 
disjoint, and S = U,;A; = U;B;. Say 


B= {bia, bio, tes Din,} 
Then S can be written as a sequence as follows: 
S = {bu, by, ar) Din ba, by, sey Dons oe A 


That is, first write down the elements of B,, then the elements of B,, and so on. 


Suppose 
A, = {4i1, Giz, 413, ---}, Ay = {dz1, A22, dz3, . - -}, 
For n > 1, define D,, = {a;:i+j =n}. For example, 
Dz = {ay}, D3 = {ay2, ao}, D4 = {a31, G22, 443}, 


Each D,, is finite and T= U,;D;. By Problem 1.64, T is countable. 


(a)x =2,y 3; (b)x =6,y=—-1. 
x=2,y=-3,z=1. 


(a) AX B= {(a,1), @,2), @,3), @4), (6, 1), (2), (6, 3), (6,4) 
(b) BXA= {(1,a), (2,4), (3,4), (4,4), (1,4), (2, a), (3,4), (4, a}. 


Note n(C’) = 2? = 4 and n(C*) = 2* = 16. 

(a) C2=Cx C= {HH,HT, TH, TT}; 

(b) C*+=CXCXCXC = {HHAH, HHHT, HHTH, HHTT, HTHH, HTHT, HTTH, ATTT, THHA, 
THHT, THTH, THTT, TTHH, TTHT, TTTH, TTTT}. 


(a) 12, 12; (b) 4, 36, 8, 216; (c) 48. 


Note Y(A) has 2° = 32 elements; that is, there are 32 subsets of A. Each subset has at most five elements, 
and we list them in terms of their numbers of elements: 

None (1): @ 

One (5): {a}, {b}, {c}, {a}, {e} 


Two (10): — {a, b}, {a,c}, {a, d}, {a, e}, {b,c}, {b, d}, {b, e}, {c, d}, {c, e}, {d, e} 

Three (10): {a,b,c}, {a, b, d}, {a, b, e}, {a,c, d}, {a, c, e}, {a, d, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e} 
Four (5): {a, b,c, d}, {a, b,c, e}, {a, b, d, e}, {a, c, d, e}, {b, c, d, e} 

Five (1): A = {a, b,c, d, e} 


(a) and (b): No. Others: Yes. 
There are five: [S], [{1, 2}, (3}], [{1, 3}, (21), (1, (2, 3}1, (1), {2}, (33). 
(a) Aj4; (b) Aza; (€) Aros (d) As; (€) Ass (f) As. 


Let n EN and let B= M(A;:i€J). Since J is infinite, there exists k EJ such thatn<k. Thusn € A, 
andson€B. That is, for every n, we have shownn € B. Thus B= ©. 


Techniques 
of Counting 


2.1 INTRODUCTION 


This chapter develops some techniques for determining, without direct enumeration, the number 
of possible outcomes of a particular experiment or event or the number of elements in a particular set. 
Such sophisticated counting is sometimes called combinatorial analysis. 


2.2 BASIC COUNTING PRINCIPLES 


There are two basic counting principles which are used throughout this chapter. One involves 
addition and the other involves multiplication. 


Sum Rule Principle 


The first counting principle follows: 


Sum Rule Principle: Suppose some event E can occur in m ways 
and a second event F can occur in n ways, and suppose both 
events cannot occur simultaneously. Then £ or F can occur in 
m+n ways. 


This principle can be stated in terms of sets, and it is simply a restatement of Lemma 1.4. 


Sum Rule Principle: Suppose A and B are disjoint sets. Then: 


n(A U B) = n(A) + n(B) 


Clearly, this principle can be extended to three or more events. That is, suppose an event E, can 
occur in n; ways, a second event E, can occur in nz ways, a third event £3; can occur in n3 ways, and 
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so on, and suppose no two of the events can occur at the same time. Then one of the events can occur 
inn +n. +n3 +... ways. 
EXAMPLE 2.1 


(a) Suppose there are 8 male professors and 5 female professors teaching a calculus class. A student can choose 
a calculus professor in 8 + 5 = 13 ways. 


(b) Suppose there are 3 different mystery novels, 5 different romance novels, and 4 different adventure novels 
on a bookshelf. Then there are 


n=3+5+4=12 


ways to choose one of the novels. 


Product Rule Principle 


The second counting principle follows: 


Product Rule Principle: Suppose an event FE can occur in m ways 
and, independent of this event, an event F can occur in n 


ways. Then combinations of events E and F can occur in mn 
ways. 


This principle can also be stated in terms of sets, and it is simply a restatement of Theorem 
1.11. 


Product Rule Principle: Suppose A and B are finite sets. Then: 


n(A X B) = n(A) -n(B) 


Clearly, this principle can also be extended to three or more events. That is, suppose an event F, 
can occur in 1, ways, then a second event £, can occur in m2 ways, then a third event £3 can occur in 
nz ways, and so on. Then all of the events can occur in n,+n2+N3*... Ways. 


EXAMPLE 2.2 
(a) Suppose a restaurant has 3 different appetizers and 4 different entrees. Then there are 
n = 3(4) = 12 
different ways to order an appetizer and an entree. 


(b) Suppose airline A has 3 daily flights between Boston and Chicago, and airline B has 2 daily flights between 
Boston and Chicago. 


(1) There are n = 3(2) = 6 ways to fly airline A from Boston to Chicago, and then airline B from Chicago 
back to Boston. 


(2) There are m = 3 + 2 = 5 ways to fly from Boston to Chicago; and hence n = 5(5) = 25 ways to fly from 
Boston to Chicago and then back again. 


(c) Suppose a college has 3 different history courses, 4 different literature courses, and 2 different science 
courses (with no prerequisites). 


(1) Suppose a student has to choose one of each of the courses. The number of ways to do this is: 


n = 3(4)(2) = 24 
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(2) Suppose a student only needs to choose one of the courses. Clearly, there are 
m=3+4+2=9 


courses, and so the student will have 9 choices. In other words, here the sum rule is used rather than 
the multiplication rule since only one of the courses is chosen. 


2.3 FACTORIAL NOTATION 


The product of the positive integers from 1 to n inclusive occurs very often in mathematics and 
hence it is denoted by the special symbol n!, read ‘“‘n factorial”. That is, 


n!=1+2+3---(n—-—2)(n—1)n =n(n—1)(n— 2)--+3+2+1 
In other words, n! may be defined by 
1!=1 and n!=n-(n-—1)! 


It is also convenient to define 0! = 1. 


EXAMPLE 2.3 
(a) 2!=2:1=2; 3!=3-2°1=6; 41=4-3+2-1= 24; 5!=5-4! =5+24 = 120 
8! 8-7-6! 12-11-10-9! = 12! 
(b) 61 61 8-7 = 56; 12-11-10 01 91 
12-11-10 1 12! 
(c) = 2-11-10-— = 
32261 3! 319! 
—1)--: + 1)-+-3+21 ! 
Oy ipa tietties 1 = yoes-(a-—r+1)(n-ry(n-r-1) _ on 
(n—r)\(n-r—-1)-+:3+2+1 (n—r)! 


(e) Using (d), we get: 


n(n—1):::(n-r+1) 1 n! 1 n! 


r(r—1)+++3+2+1 =n he pel Gl 


Stirling’s Approximation to n! 


A direct evaluation of n! when n is very large is impossible, even with modern-day computers. 
Accordingly, one frequently uses the approximation formula 


n!~ V2ann"e" 


(Here e = 2.718 28. ...) The symbol ~ means that as n gets larger and larger (that is, as n— %), the 
ratio of both sides approaches 1. 


2.4 BINOMIAL COEFFICIENTS 


n eh : . 
The symbol , where r and n are positive integers with r<n [read: “nCr’” or “n choose r’’], 
r 


is defined as follows: 


(eS a 1) ee (") n! 


r r(r—1)---3-2-1 


The equivalence of the two formulas is shown in Example 2.3(e). 
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Note that n —(n—r)=r. This yields the following important relation: 


Lemma 2.1: is = (") or, equivalently, (") = (3) where a+ b =n. 
n-r r a b 


Remark: Motivated by the second formula for (") and the fact that 0! = 1, we define: 
r 


n n! : . 0} Of | 
1 and, in particular, ) = Oro 


EXAMPLE 2.4 
8\ 8:7 9\ 9+8+7-6 12\ _ 12-11-10-9-8 
(a) =a 8 rr CCT Ae = = 792; 
Oy 21 4} 4+3-+2+1 5 5+4-+3-+2+1 
10\ 10-9-8 13\ 13 
= = 120; =—==13 
3] 3-2-1 i a 


n : : 
Note that ( has exactly r factors in both the numerator and the denominator. 
r 


10 
(b) Suppose we want to compute ( 7 \ By definition, 


= 120 


@h.. 
i, 7°6°5°4°3°2+1 


On the other hand, 10 — 7 = 3; hence using Lemma 2.1 we get: 


10 10 10-9°8 
( =( = = 120 
wi 3 B62 1 


Observe that the second method saves both space and time. 


Binomial Coefficients and Pascal’s Triangle 


n , : ; 
The numbers are called the binomial coefficients since they appear as the coefficients in the 
r 


expansion of (a +b)". Specifically, the following Binomial Theorem gives the general expression for 
the expansion of (a + b)”: 


Theorem 2.2 (Bionomial Theorem): (a + b)” = SS (;) Yams ae 


k=0 


This theorem is proved in Problem 2.34 using mathematical induction. 


The coefficients of the successive powers of a + b can be arranged in a triangular array of numbers, 
called Pascal’s triangle, as pictured in Fig. 2-1. The numbers in Pascal’s triangle have the following 
interesting properties: 


(i) The first and last number in each row is 1. 


(ii) Every other number in the array can be obtained by adding the two numbers appearing 
directly above it. For example, 10 = 4 +6, 15 =5+ 10, 20 = 10+ 10. 
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Since the numbers appearing in Pascal’s triangle are the binomial coefficients, property (ii) of 
Pascal’s triangle comes from the following theorem (proved in Problem 2.7): 


+ 
Theorem 2.3: (" ‘} = ‘i + (") 
r r-1 r 


(a+ by? =1 1 
(a+b)'=a+b de “J 
(a+ bp =a@4+2ab+ b* 1 2 1 
(a+ bf =a? + 3a°b + 3ab? + BS 13 3 1 
(a+ by =a + 4a°b + 6a°b? + 4ab? + bt 1 4 6 4 1 
(a + b)’ = a° + Satb + 10a°b? + 10a2b? + Sab* + BS C Sew» 1 
(a + b)® = a® + 6a°b + 15a*b? + 20a*b* + 15a*b* + Gab? + B® 1 6 (5 (20) 15 6 1 


Fig. 2-1. Pascal’s triangle. 


2.5 PERMUTATIONS 


Any arrangement of a set of 1 objects in a given order is called a permutation of the objects (taken 
all at a time). Any arrangement of any r<n of these objects in a given order is called an r 
permutation or a permutation of the n objects taken r at a time. Consider, for example, the set of letters 
a,b,c, d. Then: 
(i) bdca, dcba, acdb are permutations of the four letters (taken all at a time). 
(ii) bad, adb, cbd, bca are permutations of the four letters taken three at a time. 
(iii) ad, cb, da, bd are permutations of the four letters taken two at a time. 


The number of permutations of n objects taken r at a time will be denoted by 
P(n,r) 
Before we derive the general formula for P(,r) we consider a particular case. 
EXAMPLE 2.5 Find the number of permutations of six objects, say A, B, C, D, E, F, taken three at a time. In 


other words, find the number of “three-letter words” using only the given six letters without repetitions. 
Let the general three-letter word be represented by the following three boxes: 


The first letter can be chosen in 6 different ways; following this, the second letter can be chosen in 5 different ways; 
and, following this, the last letter can be chosen in 4 different ways. Write each number in the appropriate box 
as follows: 


6 5 4 


Accordingly, by the product rule principle, there are 6-5 - 4 = 120 possible three-letter words without repetitions 
from the six letters, or there are 120 permutations of six objects taken three at a time. Thus, we have 
shown that 


P(6, 3) = 120 
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Derivation of the Formula for P(n, r) 


The derivation of the formula for the number of permutations of n objects taken r at a time, or 
the number of r permutations of n objects, P(n, r), follows the procedure in the preceding example. The 
first element in an r permutation of n objects can be chosen in n different ways; following this, the 
second element in the permutation can be chosen in n — 1 ways; and, following this, the third element 
in the permutation can be chosen in — 2 ways. Continuing in this manner, we have that the rth (last) 
element in the r permutation can be chosen in n — (r—1) =n—r+1 ways. Thus, by the funda- 
mental principle of counting, we have 


P(n,r) =n(n-1)(n-2)-+:-(n-r +1) 
By Example 2.3(e), we see that 


n(n —1)(n—-2)---(n-r+1):-(n—7r)! n! 
—1 —2)--- t1)j)= = 
n(n —1)(n = 2) (n= r +1) aa a 
Thus, we have proven the following theorem. 
n!} 
Theorem 2.4: P(n,r) =. 
(n—r)! 


Consider the case thatr =n. We get 
P(n,n) =n(n— 1)(n—2)---3+2-1L=n! 
Accordingly, 
Corollary 2.5: There are n! permutations of n objects (taken all at a time). 
For example, there are 3! = 1-2-3 = 6 permutations of the three letters a, b,c. These are 


abc, acb, bac, bca, cab, cba 


Permutations with Repetitions 


Frequently we want to know the number of permutations of a multiset, that is, a set of objects some 
of which are alike. We will let 


P(n; ny, No, ..., ,) 
denote the number of permutations of n objects of which n, are alike, m2 are alike, ..., n, are 
alike. The general formula follows: 
n! 
Theorem 2.6: P(n; 11, No, ..., 2.) =——————_ 
n!nz!---n,! 


We indicate the proof of the above theorem by a particular example. Suppose we want to form 
all possible five-letter “words” using the letters from the word “BABBY’’. Now there are 5! = 120 
permutations of the objects B,, A, Bz, B;, Y, where we have distinguished the three B’s. Observe that 
the following 6 permutations produce the same word when the subscripts are removed: 


B,B,B,AY, B,B;B,AY, B>B,B;AY, B,B;B,AY, B;B,B,AY, B;B,B,AY 


The 6 comes from the fact that there are 3! = 3-2-1 = 6 different ways of placing the three B’s in the 
first three positions in the permutation. This is true for each set of three positions in which the three 
B’s can appear. Accordingly, there are 


different five-letter words that can be formed using the letters from the word “BABBY”. 
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EXAMPLE 2.6 


(a) Find the number m of seven-letter words that can be formed using the letters of the word “BENZENE”. 
We seek the number of permutations of seven objects of which three are alike, the three E’s, and two 
are alike, the two N’s. By Theorem 2.6, 


TM. F260 5*493°21 
3!2! 32212251 
(b) Find the number m of different signals, each consisting of eight flags in a vertical line, that can be formed 

from four indistinguishable red flags, three indistinguishable white flags, and a blue flag. 


We seek the number of permutations of eight objects of which four are alike, the red flags, and three 
are alike, the white flags. By Theorem 2.6, 


m = P(7;3,2) = = 420 


81 8+7+6+5+4+3+2+1 
4131 4+3+2+1+3+2+1 


m = P(8;4,3) = = 280 


Ordered Samples 


Many problems in combinatorial analysis and, in particular, probability are concerned with 
choosing an element from a set S containing n elements (or a card from a deck or a person from a 
population). When we choose one element after another from the set S, say r times, we call the choice 
an ordered sample of size r. We consider two cases: 


(i) Sampling with Replacement: Here the element is replaced in the set S before the next element 
is chosen. Since there are n different ways to choose each element (repetitions are allowed), 
the product rule principle tells us that there are 


r times 
re ON 


nenen:::n=zan" 


different ordered samples with replacement of size r. 


(ii) Sampling without Replacement: Here the element is not replaced in the set S before the next 
element is chosen. Thus, there are no repetitions in the ordered sample. Accordingly, an 
ordered sample of size r without replacement is simply an r permutation of the elements in the 
set S with n elements. Thus, there are 


n! 
(n—r)! 


different ordered samples without replacement of size r from a population (set) with n 
elements. In other words, by the product rule, the first element can be chosen in n ways, the 
second in 1 — 1 ways, and so on. 


P(n,r) =n(n-1)(n-2)-+-(n-rt+l)= 


EXAMPLE 2.7 Three cards are chosen in succession from a deck with 52 cards. Find the number of ways this 
can be done: (a) with replacement, (b) without replacement. 


(a) Since each card is replaced before the next card is chosen, each card can be chosen in 52 ways. Thus, 
52(52)(52) = 52? = 140,608 


is the number of different ordered samples of size r = 3 with replacement. 


(b) Since there is no replacement, the first card can be chosen in 52 ways, the second card in 51 ways, and the 
last card in 50 ways. Thus, 


P(52, 3) = 52(51)(50) = 132,600 


is the number of different ordered samples of size r = 3 without replacement. 
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2.6 COMBINATIONS 


Suppose we have a collection of n objects. A combination of these n objects taken r at a time is 
any selection of r of the objects where order doesn’t count. In other words, an r combination of a set 
of n objects is any subset of r elements. For example, the combinations of the letters a, b, c, d taken 
three at a time are 


{a, b,c}, {a, b, d}, {a, c,d}, {b,c, d} or simply abc, abd, acd, bcd 
Observe that the following combinations are equal: 
abc, acb, bac, bca, cab, cba 
That is, each denotes the same set {a, b, c}. 
The number of combinations of n objects taken r at a time will be denoted by 
C(n, r) 


Before we derive the general formula for C(n,r), we consider a particular case. 


EXAMPLE 2.8 Find the number of combinations of four objects, a, b, c, d, taken three at a time. 

Each combination consisting of three objects determines 3! = 6 permutations of the objects in the 
combination as pictured in Fig. 2-2. Thus, the number of combinations multiplied by 3! equals the number of 
permutations. That is, 


P(4, 3) 
3! 
But P(4, 3) = 4-3-2 = 24 and 3! =6. Thus, C(4,3) = 4, which is noted in Fig. 2-2. 


C(4,3)*3! = P(4,3) or C(4,3)= 


Combinations Permutations 


abc abc, acb, bac, bca, cab, cba 


abd abd, adb, bad, bda, dab, dba 


acd acd, adc, cad, cda, dac, dca 


bcd bcd, bdc, cbd, cdb, dbc, dcb 


Fig. 2-2 


Formula for C(n, r) 


Since any combination of n objects taken r at a time determines r! permutations of the objects in 
the combination, we can conclude that 


P(n,r) = r!C(n, r) 


40 TECHNIQUES OF COUNTING [CHAP. 2 


Thus, we obtain the following formula for C(n, r): 


P ! 
Theorem 2.7; C(n,r) = es ; 
r! ri(n—r)! 
! 
Recall that the binomial coefficient @ was defined to be G0 Accordingly, 
r ri(n—r)! 


C(n,r) = (") 


We shall use C(n, r) and (") interchangeably. 
r 


EXAMPLE 2.9 


(a) Find the number m of committees of 3 that can be formed from 8 people. 
Each committee is, essentially, a combination of the 8 people taken 3 at a time. Thus 
8 8-7-6 
3 3224 


56 


m = C(8,3) ( 


(b) A farmer buys 3 cows, 2 pigs, and 4 hens from a person who has 6 cows, 5 pigs, and 8 hens. How many 
choices does the farmer have? 


6 5 8 
The farmer can choose the cows in i) ways, the pigs in (3) ways, and the hens in (;) ways. 
Accordingly, altogether the farmer can choose the animals in 


6\/5\/8\  6°5°4 5+4 8+7-6+5 
( ( ( = rasan = 20-10-70 = 14,000 ways 
S)\O)\4) > 321 2x1 494 


EXAMPLE 2.10 Find the number m of ways that 9 toys can be divided between 4 children if the youngest is to 
receive 3 toys and each of the others 2 toys. 


There are C(9,3) = 84 ways to first choose 3 toys for the youngest. Then there are C(6,2) = 15 ways to 
choose 2 of the remaining 6 toys for the oldest. Next, there are C(4,2) = 6 ways to choose 2 of the remaining 
4 toys for the second oldest. The third oldest receives the remaining 2 toys. Thus, by the product rule, 


m = 84(15)(6) = 7560 


Alternately, by Problem 2.37, 
! 
m= oe 7560 
3121212! 


EXAMPLE 2.11 Find the number m of ways that 12 students can be partitioned into 3 teams, T,, T;, T3, so that 
each team contains 4 students. 


Method 1: Let A be one of the students. Then there are C(11, 3) ways to choose 3 other students to be on 
the same team as A. Now let B denote a student who is not on the same team as A; then there are C(7, 3) 
ways to choose 3 students out of the remaining students to be on the same team as B. The remaining 4 
students constitute the third team. Thus, altogether, the number m of ways to partition the students is as 
follows: 


11-7 
m= c(11,3)-00,3)= (3, )-(3] = 165-35 = 5775 
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Method 2: Each partition [7,, T>, 73] of the students can be arranged in 3! = 6 ways as an ordered 
partition. By Problem 2.37 (or using the method in Example 2.10), there are 


12! 
—"_ = 34,650 
41414! 


such ordered partitions. Thus, there are m = 34,650/6 = 5775 (unordered) partitions. 


2.7 TREE DIAGRAMS 


A tree diagram is a device used to enumerate all the possible outcomes of a sequence of 
experiments or events where each event can occur in a finite number of ways. The construction of 
tree diagrams is illustrated in the following examples. 


EXAMPLE 2.12 Find the product set A x B X C where 
A = {1,2}, B = {a, b, c}, C = {3, 4} 
The tree diagram for the A X B X C appears in Fig. 2-3. Observe that the tree is constructed from left to 
right and that the number of branches at each point corresponds to the number of possible outcomes of the next 


event. Each endpoint of the tree is labeled by the corresponding element of AX BX C. As expected from 
Theorem 1.11, A X B X C contains n = 2-3-2 = 12 elements. 


ew, (14,3) 
Se 4 (1, a, 4) 


ee 3 (1, 5,3) 


4 (1, , 4) 


ee (1,¢,3) 
Po aE (1,¢,4) 


oe 3 (2,a, 3) 
a 4 (2. a, 4) 


3 (2, 6,3) 
Be cance 
PPS aa 4 (2, b, 4) 


eee 3 Qe 3y 
NS 4 (2.6.4) 
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EXAMPLE 2.13 Marc and Erik are to play a tennis tournament. The first person to win 2 games in a row or 
who wins a total of 3 games wins the tournament. Find the number of ways the tournament can occur. 


The tree diagram showing the possible outcomes of the tournament appears in Fig. 2-4. Specifically, there 
are 10 endpoints which correspond to the following 10 ways that the tournament can occur: 


MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE 


Fig. 2-4 


The path from the beginning of the tree to the endpoint describes who won which game in the individual 
tournament. 


Solved Problems 


FACTORIAL NOTATION AND BINOMIAL COEFFICIENTS 
2.1. Compute: (a) 4!, 5!, 6!, 7!, 8!, 9!, 10!; (b) 50! 


(a) Use (n+ 1)! = (n+ 1)n! after calculating 4! and 5!: 


4)=1-2-3-4=24, 7! = 7(6!) = 7(720) = 5040 
5!=1-+2+3-4+5 = 5(24) = 120, 8! = 8(7!) = 8(5040) = 40,320 
6! = 6(5!) = 6(120) = 720, 91 = 9(8!) = 9(40,320) = 362,880 


10! = 10(9!) = 10(362,880) = 3,628,800 


(b) Since n is very large, we use Stirling’s approximation that n! ~ V2ann"e™" (where e = 2.718). Let 
N = V1007750°e~*° ~ 50! 


Evaluating N using a calculator, we get N = 3.04 x 10™ (which has 65 digits). 
Alternately, using (base 10) logarithms, we get 
log N = log(V/1007750°’e~°) 
= slog 100 + Slog 7 + 50 log 50 — SO loge 
= $(2) + 4(0.497 2) + 50(1.699 0) — 50(0.434 3) 
= 64.483 6 


The antilog yields N = 3.04 x 10. 
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2.2. 


2.3. 


2.4. 


2.5. 


(6) 


Compute: ee a ? 


11! 11+10+9+8+7+6+5+4+3+2+1 
(a) = = 11-10 = 110 
9! 9+8+7+6+5+4+3+2+1 


Alternately, this could be solved as follows: 


11! 11-10-9! 


=11-10=110 
9! 9! 
6! 6! 1 1 
() 9 -5-8-7-61 9-8-7 504 
ate ! +2)! 
Simplify: (a) Gar (b) a 
nm n(n—1)(n—2)-+-3+2-1 
O) GD G@= De 2s 24 
; n! n(n — 1)! 
or simply G20 = =D! = 
w @ ae _(n4 eee (n +2)(n +1) =n? +34 
or simply je —=) = wee (n+2)(n+1)=n?+3n+2 


Compute: @ (5 i) o(" ) 


Recall that there are as many factors in the numerator as in the denominator. 


14\ 14-13-12 
(a) ( eos = 330 


3 B21 


11 11-10-9-8 
= 364, (b) ( jaa 


4 4+3+2+1 


8 
Compute: (a) oe (b) : 
8 8:7-6: 
(@) (6) = 6254 28 
or, since 8 — 6 = 2, we can use Lemma 2.1 to obtain: 


()-C)-e™ 


(b) Since 10 — 7 = 3, Lemma 2.1 tells us that 


10 10\ 10:9-8 
= = = 120 
(; is 3-2-1 


43 


44 


2.6. 


2.7. 
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mow ()(8)(8 


16 16 16! 16! : . 6 11 . 
Now Multiply the first fraction by 6 and the second by rr to obtain 


5 6) 5!11! 6110!" 
the same denominator in both fractions; and then add: 
(‘’) | (‘S) - 6-16! 11-16! _ 6-16! | 11-16! 
5) \6 6-5!-11! 6!-11-10! 6!-11! 6! 11! 


_ 6-161 + 11-16! (6 +11)-16! 17-16! 17! -("7) 
61-11! 6!- 11! 6l- 11! 6l- 1! = 6 


+ 
Prove Theorem 2.3: (" ‘} = ‘ + ("). 
r r-1 r 


(The technique in this proof is similar to that of the preceding problem.) 


! ! 
Now ( ss (") = = . To obtain the same denominator in 
r-1 r (r-1)!-(a-rt+1)! rls (™—r)! 
—r+1 
both fractions, multiply the first fraction by ” and the second fraction by a Hence 
r n—-rt 


n _(2\_ rent (n-—r+1):n! 
(74); (") r-(r—1)!-\(n—r4+1)! rls(n—-rt1)-(n—D)! 
ren (n-rt+1)-n! 
r(n-r+1)! rl(n—r+1)! 


_rentt+(n-rt ent [r+(1—-rt1)]-n! 
ri(n—r+1)! ri(n—r+1)! 
(nt+1)n! (m+ 1)! (n +1 

r(n—-r+1)! rl(n-r+1)! ( 


r 


COUNTING PRINCIPLES 


2.8. 


2.9. 


2.10. 


Suppose a bookcase shelf has 5 history texts, 3 sociology texts, 6 anthropology texts, and 4 
psychology texts. Find the number n of ways a student can choose: (a) one of the texts; (b) one 
of each type of text. 

(a) Here the sum rule applies; hence n =5+3+6+4= 18. 

(b) Here the product rule applies; hence n = 5-3+6+4 = 360. 


A restaurant has a menu with 4 appetizers, 5 entrees, and 2 desserts. Find the number n of 
ways a customer can order an appetizer, entree, and dessert. 


Here the product rule applies since the customer orders one of each. Thus n = 4-5-2 = 40. 


A history class contains 8 male students and 6 female students. Find the number n of ways that 
the class can elect: (a) 1 class representative; (b) 2 class representatives, 1 male and 1 female; 
(c) 1 president and 1 vice-president. 

(a) Here the sum rule is used; hence n = 8 + 6 = 14. 

(b) Here the product rule is used; hence n = 8: 6 = 48. 


(c) There are 14 ways to elect the president, and then 13 ways to elect the vice-president. Thus, 
n= 14-13 = 182. 
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2.11. 


2.12. 


2.13. 


There are 5 bus lines from city A to city B and 4 bus lines from city B to city C. Find the 
number 7 of ways a person can travel by bus: 


(a) from A to C by way of B, (6) round-trip from A to C by way of B, 
(c) round-trip from A to C by way of B, without using a bus line more than once. 


(a) There are 5 ways to go from A to B and 4 ways to go from B to C; hence, by the product rule, 
n=5:4=20. 


(b) There are 20 ways to go from A to C by way of B and 20 ways to return. Thus, by the product rule, 
n = 20-20 = 400. 


(c) The person will travel from A to B to Cto Bto A. Enter these letters with connecting arrows as 
follows: 
A-—-B—-C-B-A 


There are 5 ways to go from A to B and 4 ways to go from B to C. Since a bus line is not to be used 
more than once, there are only 3 ways to go from C back to B and only 4 ways to go from B back 
to A. Enter these numbers above the corresponding arrows as follows: 

A>+B+C+BA4A 


Thus, by the product rule, n = 5-4-3+4 = 240. 


Suppose there are 12 married couples at a party. Find the number n of ways of choosing a man 
and a woman from the party such that the two are: (a) married to each other, (b) not married 
to each other. 

(a) There are 12 married couples and hence there are n = 12 ways to choose one of the couples. 


(b) There are 12 ways to choose, say, one of the men. Once the man is chosen, there are 11 ways to 
choose the women, anyone other than his wife. Thus, m = 12(11) = 132. 


Suppose a password consists of 4 characters, the first 2 being letters in the (English) alphabet 
and the last 2 being digits. Find the number n of: 


(a) passwords, (b) passwords beginning with a vowel 


(a) There are 26 ways to choose each of the first 2 characters and 10 ways to choose each of the last 2 
characters. Thus, by the product rule, 
n = 26: 26-10-10 = 67,600 
(b) Here there are only 5 ways to choose the first character. Hence n = 5+ 26-10-10 = 13,000. 


PERMUTATIONS AND ORDERED SAMPLES 


2.14. 


2.15. 


State the essential difference between permutations and combinations, with examples. 


Order counts with permutations, such as words, sitting in a row, and electing a president, 
vice-president, and treasurer. Order does not count with combinations, such as committees and teams 
(without counting positions). The product rule is usually used with permutations since the choice for 
each of the ordered positions may be viewed as a sequence of events. 


Find the number 7 of ways that 4 people can sit in a row of 4 seats. 
The 4 empty seats may be pictured by 


’ > ’ 


The first seat can be occupied by any one of the 4 people, that is, there are 4 ways to fill the first 
seat. After the first person sits down, there are only 3 people left and so there are 3 ways to fill the second 
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seat. Similarly, the third seat can be filled in 2 ways, and the last seat in 1 way. This is pictured by 
A, Be oy Dg 


Thus, by the product rule,n = 4-3-2+1= 4! = 24. 
Alternately, n is the number of permutations of 4 things taken 4 at a time, and so 


n= P(4,4) = 4! = 24 


2.16. A family has 3 boys and 2 girls. (a) Find the number of ways they can sitinarow. (b) How 
many ways are there if the boys and girls are each to sit together? 


(a) The 5 children can sit in a row in 5+4-3-+2+1 =5! = 120 ways. 


(b) There are 2 ways to distribute them according to sex: BBBGG or GGBBB. In each case, the boys 
can sit in 3-2-1 = 3! = 6 ways and the girls can sit in 2-1 = 2! =2 ways. Thus, altogether, there 
are 2+3!+-2! =2-6-2 = 24 ways. 


2.17. Find the number zn of distinct permutations that can be formed from all the letters of each word: 
(a) THOSE, (b) UNUSUAL, (c) SOCIOLOGICAL. 


This problem concerns permutations with repetitions. 


(a) n=5! = 120, since there are 5 letters and no repetitions. 


7! 
(b) n= rT 840, since there are 7 letters of which 3 are U and no other letter is repeated. 


12! 
(c) n= 31212101" since there are 12 letters of which 3 are O, 2 are C, 2 are J, and 2 are L. 


2.18. Find the number n of different signals, each consisting of 6 flags hung in a vertical line, that can 
be formed from 4 identical red flags and 2 identical blue flags. 


This problem concerns permutations with repetitions. Thus, = a 15 since there are 6 flags of 


which 4 are red and 2 are blue. 


2.19. Find the number 7 of ways that 7 people can arrange themselves: (a) in a row of 7 chairs, (b) 
around a circular table. 
(a) The 7 people can arrange themselves in a row inn = 7:6:5-+4+3+2+1=7! ways. 
(b) One person can sit at any place at the circular table. The other 6 people can then arrange 
themselves inn = 6:5+4-+3-+2+1= 6! ways around the table. 


This is an example of a circular permutation. In general, n objects can be arranged in a circle 
in (n—1)(n—2)--+3+2+1=(n—1)! ways. 


2.20. Suppose repetitions are not allowed. (a) Find the number n of three-digit numbers that can 
be formed from the six digits: 2,3,5,6,7,9. (b) How many of them are even? (c) How many 
of them exceed 400? 


There are 6 digits, and the three-digit number may be pictured by 


? ’ 


In each case, write down the number of ways that one can fill each of the positions. 


a) There are 6 ways to fill the first position, 5 ways for the second position, and 3 ways for the third 
y P y P y 
position. This may be pictured by: 6 , 5 , 4 . Thus,n =6:5-4= 120. 
Alternately, n is the number of permutations of 6 things taken 3 at a time, and so 


n= P(6,3) =6°5:4= 120 
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2.21. 


2.22. 


(b) Since the numbers must be even, the last digit must be either 2 or 4. Thus, the third position is filled 
first and it can be done in 2 ways. Then there are now 5 ways to fill the middle position and 4 ways 
to fill the first position. This maybe pictured by: _ 4 , 5 , 2. Thus,4-5-2 = 120 of the 
numbers are even. 


(c) Since the numbers must exceed 400, they must begin with 5, 6, 7, or 9. Thus, we first fill the first 
position and it can be done in 4 ways. Then there are 5 ways to fill the second position and 4 ways 
to fill the third position. This may be pictured by: 4 , 5 , 4 . Thus,4-5-4= 80of the 
numbers exceed 400. 


A class contains 8 students. Find the number of ordered samples of size 3: 
(a) with replacement, (b) without replacement. 


(a) Each student in the ordered sample can be chosen in 8 ways; hence there are 8-8-8 = 8° = 512 
samples of size 3 with replacement. 


(b) The first student in the sample can be chosen in 8 ways, the second in 7 ways, and the last in 6 
ways. Thus, there are 8-7-6 = 336 samples of size 3 without replacement. 


Find n if: (a) P(n, 2) = 72, (b) 2P(n, 2) + 50 = P(2n, 2). 
(a) P(n,2) = n(n — 1) = n? — n; hence 
wWw-n=72 or w-n-72=0 or (n—9)(n+8)=0 
Since n must be positive, the only answer is n = 9. 
(b) P(n,2) = n(n — 1) =n? —n and P(2n, 2) = 2n(2n — 1) = 4n? — 2n. Hence: 
2(n? —n) +50 =4n?—2n or 2n? —2n+50 = 4n? —2n 
or 50=2n* or n*=25 


Since n must be positive, the only answer is n = 5. 


COMBINATIONS AND PARTITIONS 


2.23. 


2.24, 


There are 12 students who are eligible to attend the National Student Association annual 
meeting. Find the number n of ways a delegation of 4 students can be selected from the 12 
eligible students. 


This concerns combinations, not permutations, since order does not count in a delegation. There are 
“12 choose 4” such delegations. That is, 


495 


12\ 12-11-10: 
n= (12.4) = ( )- a” 


4 4+3+2+1 


A student is to answer 8 out of 10 questions on an exam. 
(a) Find the number n of ways the student can choose the eight questions. 
(b) Find n if the student must answer the first three questions. 
(a) The 8 questions can be selected “10 choose 8” ways. That is, 
10 10\ 10-9 
= C(10, 8 45 
een ( 8 ( 2 2-1 


(b) If the first 3 questions are answered, then the student must choose the other 5 questions from the 
remaining 7 questions. Hence 


n= C(7,5) () (3) ma 
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2.25. 


2.26. 


2.27. 
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A class contains 10 students with 6 men and 4 women. Find the number n of ways: 


(a) A 4-member committee can be selected from the students. 
(b) A 4-member committee with 2 men and 2 women. 
(c) The class can elect a president, vice-president, treasurer, and secretary. 


(a) This concerns combinations, not permutations, since order does not count ina committee. There are 
“10 choose 4” such committees. That is, 
10\  10-9-8-+7 | 
4 4-3-2+1 


210 


n= C(10, 4) = ( 


6 
(b) The 2 men can be chosen from the 6 men in (5) ways and the 2 women can be chosen from the 4 


4 
women in (3) ways. Thus, by the product rule, 


ac b OSS Foo 
Volo) Fed Fed wee 


(c) This concerns permutations, not combinations, since order does count. Thus, 


n= P(6,4) = 6:5-4+3 = 360 


A box contains 7 blue socks and 5 red socks. Find the number n of ways two socks can be 
drawn from the box if: (a) They can be any color; (b) They must be the same color. 


(a) There are “12 choose 2” ways to select 2 of the 12 socks. That is, 
4 12-11 


66 


n= C(12,2) (5 oe 


(b) There are C(7,2) = 21 ways to choose 2 of the 7 blue socks and C(5, 2) = 10 ways to choose 2 of the 
5 red socks. By the sum rule, n = 21 + 10 = 31. 


Let A, B,..., L be 12 given points in the plane R’ such that no 3 of the points lie on the same 
line. Find the number n of: 


(a) Lines in R? where each line contains two of the points. 

(b) Lines in R’ containing A and one of the other points. 

(c) Triangles whose vertices come from the given points. 

(d) Triangles whose vertices are A and two of the other points. 


Since order does not count, this problem involves combinations. 


(a) Each pair of points determines a line; hence 


66 


12 12-11 
n = “12 choose 2” = C(12, 2) > aa 


(b) We need only choose one of the 11 remaining points; hence n = 11. 


(c) Each triple of points determines a triangle; hence 


12 
n = “12 choose 3” = C(12,3) = ( = 220 
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2.28. 


2.29. 


2.30. 


2.31. 


(d) We need only choose two of the 11 remaining points; hence n = C(11,2) = 55. (Alternately, there 
are C(11, 3) = 165 triangles without A as a vertex; hence 220 — 165 = 55 of the triangles do have A 
as a vertex.) 


There are 12 students in a class. Find the number n of ways that 12 students can take 3 
different tests if 4 students are to take each test. 


There are C(12, 4) = 495 ways to choose 4 students to take the first test; following this, there are 
C(8, 4) = 70 ways to choose 4 students to take the second test. The remaining students take the third 
test. Thus 


n = 70(495) = 34,650 


Find the number of ways 12 students can be partitioned into 3 teams A,, Az, A3, so that each 
team contains 4 students. (Compare with the preceding Problem 2.28.) 

Let A denote one of the students. There are C(11,3) = 165 ways to choose 3 other students to be 
on the same team as A. Now let B be a student who is not on the same team as A. Then there are 
C(7, 3) = 35 ways to choose 3 from the remaining students to be on the same team as B. The remaining 
4 students form the third team. Thus, n = 35(165) = 5925. 

Alternately, each partition [A;, A2, A3] can be arranged in 3! = 6 ways as an ordered partition. By 
the preceding Problem 2.28, there are 34,650 such ordered partitions. Thus, n = 34,650/6 = 5925. 


Find the number n of committees of 5 with a given chairperson that can be selected from 12 
persons. 


Method 1: The chairperson can be chosen in 12 ways and, following this, the other 4 on the 
committee can be chosen from the remaining 11 people in C(11, 4) = 330 ways. Thus, 


n = 12(330) = 3960 


Method 2: The 5-member committee can be chosen from the 12 persons in C(12,5) = 792 ways. 
Each committee can then select a chairman in 5 ways. Thus, 


n = §(792) = 3960 


There are n married couples at a party. (a) Find the number N of (unordered) pairs at the 
party. (b) Suppose every person shakes hands with every other person other than his or her 
spouse. Find the number M of handshakes. 


(a) There are 2n people at the party, and so there are “2n choose 2” pairs. That is, 


2n(2n — 1) 
2 


N = C(2n, 2) = n(2n — 1) =2n?-n 


(b) M is equal to the number of pairs who are not married. There are m married pairs. Thus, 
using (a), 


M = 2n? —n—n = 2n? — 2n = 2n(n - 1) 
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TREE DIAGRAMS 
2.32. Construct the tree diagram that gives the permutations of {a, b, c}. 


The tree diagram, drawn downward with the “root” on the top, appears in Fig. 2-5. Each path from 
the root to an endpoint (“leaf”) of the tree represents a permutation. There are 6 such paths which yield 
the following 6 permutations: 


abc, acb, bac, bca, cab, cba 


/\ [\ [\ 


b 


Fig. 2-5 


2.33. Audrey has time to play roulette at most 5 times. At each play she wins or loses $1. She 
begins with $1 and will stop playing before 5 plays if she loses all her money. 


(a) Find the number of ways the betting can occur. 
(b) How many cases will she stop before playing 5 times? 
(c) How many cases will she leave without any money? 


Construct the appropriate tree diagram as shown in Fig. 2-6. Each number in the diagram denotes 
the number of dollars she has at that moment in time. Thus, the root, which is circled, is labeled with the 


number 1. 


a 
LO. as 


\ LN LN 
\N\AAAAA 


Fig. 2-6 
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(a) There are 14 paths from the root of the tree to an endpoint (“‘leaf’’), so the betting can occur in 14 
different ways. 


(b) There are only 2 paths with less than 5 edges, so Audrey will not play 5 times in only 2 of the 
cases. 


(c) There are 4 paths which end in 0 or, in other words, only 4 of the leaves are labeled with 0. Thus, 
Audrey will leave without any money in 4 of the cases. 


MISCELLANEOUS PROBLEMS 


n i 7 
2.34. Prove Theorem 2.2 (binomial theorem): (a + b)” = S ( ie ie 


k=0 
The theorem is true for n = 1, since 


1 


1 1 1 
> ( arrpr = ( Jaro + ( eo! =at+b=(a+b)! 
r 0 1 


r=0 
We assume the theorem holds for (a + b)" and prove it is true for (a + b)"*! 
(a+ b)"*!=(a+b)(a+ by 
=(a+b) G + (TJens tee t ( ” eae + (arr test (1) aor + o"| 
— 


r 


Now the term in the product which contains b’ is obtained from 


of (2 erro] eal ("oror]= (0 Jarrters (Marry 
r-1 r r-1 r 
a ea Pee 
r-1 r 


n n n+1 oe ; 
But, by Theorem 2.3 1 aa = . Thus, the term containing D’ is 
= 

‘ +1 


quti br 
r 


Note that (a + b)(a + b)" is a polynomial of degree n +1 in b. Consequently, 


nt1 


(a+ by"! = (a+ bya+by'= >) eee 


r=0 


which was to be proved. 


23s, rrove:(4) + (4) +(8) + (4)«(*) a6 


Note that 16 = 2*= (1+ 1)*. Expanding (1 + 1)‘, using the binomial theorem, yields: 


16 =(1+1)* ae ({Jen+ (j)er+ (S)er+ as 


(0) * (1) * (2) * (3) * (4) 


52 


2.36. 


2.37. 
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Let n and n,,n,...,n, be nonnegative integers such that n; +m, +---+n,=n. The multi- 
nomial coefficients are denoted and defined by 


n n! 
Ny, Nz,...,N, n,!n!---n,! 


Compute the following multinomial coefficients: 


6 8 10 
(2) lena) (°) oe) ©) ies 55) 


Use the above formula to obtain: 


6! 6+5+4+3+2+1 
(a) = = = 60 
S94) St 35a ss 54 
8 8! 8+7°6:5:4:3+2+1 
(b) ( = = = 420 
4,2,2,0) 4121210! 4-3-2-1-2-1-2+1-1 


(Here we use the act that 0! = 1.) 


10 : é 
+34+2+ : 
53,2, ,) has no meaning, since 5+3+2+2#10 


(c) The expression ( 
Suppose S$ contains n elements, and let 1, m2,...,n, be positive integers such that 
Mm t+ngtss: +n, =Nn 


Prove there exists 


n _ n! 
Ny,Nz,...,N, n,!no!---n,! 
different ordered partitions of S of the form [A,, A2,...,A,] where A, contains n, elements, A, 
contains n, elements, ..., A, contains n, elements. 


eos : n 
We begin with n elements in S; hence there are ( 


ways of selecting the cell A,;. Following this, 
nN 


there are n — n, elements left in S, that is, in §\A,; hence there are ( : ways of selecting the cell 
nN 
A,. Similarly, for i= 3,4,...,7, there are (" me ag ways of selecting the cell A,. 


nj 


ICE ECE) © 


different ordered partitions of S. Now (*) is equal to 


Accordingly, there are 


n! (n—n,)! (n — n, — ny)! (n-n,—+++—N,-4)! 


n(n — n,)! ; No!(n — Ny — Nz)! n3!(n — Ny — Ny — N3)! n,\(n — ny — Ny — +++ —N,)! 


But this is equal to 


( " : 
Ny, No,...,N, n,!n,!---n,! 


since each numerator after the first is cancelled by the second term in the preceding denominator and since 
(n-—n,—-++:—n,)!=0!=1. Thus, the theorem is proved. 
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Supplementary Problems 


FACTORIAL NOTATION AND BINOMIAL COEFFICIENTS 


2.38. 


2.39, 


2.40. 


2.41. 


2.42. 


2.43. 


2.44, 


Find: (a) 10!,11!,12! (6) 60! (Hint: Use Stirling’s approximation to n!.) 


Compute: (a) ~. (b) = () =, (d) a 
Simplify: (a) eee (b) a (<) oon @) or 


Compute: (a) er (b) (3). (c) ey (4) ch (e) cer (f) oe 


Show that: (a) 4 (7) 


@ ()-(1) 


Evaluate the following multinomial coefficients (defined in Problem 2.36): 


- ie a ~ ine ae) © (ssa) i (4.3.2): 


Find the (a) ninth and (b) tenth rows of Pascal’s triangle, assuming the following is the eighth row: 


1 8 28 70 56 28 8 1 


COUNTING PRINCIPLES, SUM AND PRODUCT RULES 


2.45. 


2.46. 


2.47. 


2.48. 


2.49, 


2.50. 


A store sells clothes for men. It has 3 different kinds of jackets, 7 different kinds of shirts, and 5 different 
kinds of pants. Find the number of ways a person can buy: 


(a) one of the items for a present, (b) one of each of the items for a present. 


A restaurant has, on its dessert menu, 4 kinds of cakes, 2 kinds of cookies, and 3 kinds of ice cream. Find 
the number of ways a person can select: (a) one of the desserts, (b) one of each kind of dessert. 


A class contains 8 male students and 6 female students. Find the number of ways that the class can elect: 
(a) a class representative; (b) 2 class representatives, 1 male and 1 female; (c) a president and a 
vice-president. 


Suppose a password consists of 4 characters where the first character must be a letter of the (English) 
alphabet, but each of the other characters may be a letter or a digit. Find the number of: 


(a) passwords, (b) passwords beginning with one of the 5 vowels. 


Suppose a code consists of 2 letters followed by 3 digits. Find the number of: 
(a) codes, (b) codes with distinct letters, (c) codes with the same letters. 
There are 6 roads between A and B and 4 roads between B and C. Find the number n of ways a person 


can drive: (a) from A to C by way of B, (b) round-trip from A to C by way of B, (c) round-trip from A 
to C by way of B without using the same road more than once. 
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PERMUTATIONS AND ORDERED SAMPLES 


2.51. Find the number n of ways a judge can award first, second, and third places in a contest with 18 
contestants. 

2.52. Find the number n of ways 6 people can ride a toboggan where: (a) anyone can drive, (b) one of 3 must 
drive. 

2.53. A debating team consists of 3 boys and 3 girls. Find the number n of ways they can sit in a row where: 
(a) there are no restrictions, (b) the boys and girls are each to sit together, (c) just the girls are to sit 
together. 

2.54. Find the number n of permutations that can be formed from all the letters of each word: 

(a) QUEUE, (b) COMMITTEE, (c) PROPOSITION, (d) BASEBALL. 

2.55. Find the number n of different signals, each consisting of 8 flags hung in a vertical line, that can be formed 
from 4 identical red flags, 2 identical blue flags, and 2 identical green flags. 

2.56. Find the number n of ways 5 large books, 4 medium-size books, and 3 small books can be placed on a shelf 
so that all books of the same size are together. 

2.57. A box contains 12 light bulbs. Find the number n of ordered samples of size 3: 

(a) with replacement, (b) without replacement. 

2.58. A class contains 10 students. Find the number n of ordered samples of size 4: 
(a) with replacement, (b) without replacement. 

COMBINATIONS 

2.59. A restaurant has 6 different desserts. Find the number of ways a customer can choose 2 of the 
desserts. 

2.60. A store has 8 different mystery books. Find the number of ways a customer can buy 3 of the books. 

2.61. A box contains 6 blue socks and 4 white socks. Find the number of ways two socks can be drawn from 
the box where: (a) there are no restrictions, (b) they are different colors, (c) they are to be the same 
color. 

2.62. A class contains 9 boys and 3 girls. Find the number of ways a teacher can select a committee of 4. 

2.63. Repeat Problem 2.62, but where: (a) there are to be 2 boys and 2 girls, (b) there is to be exactly 1 girl, (c) 
there is to be at least 1 girl. 

2.64. A woman has 11 close friends. Find the number of ways she can invite 5 of them to dinner. 

2.65. Repeat Problem 2.64, but where 2 of the friends are married and will not attend separately. 

2.66. Repeat Problem 2.64, but where 2 of the friends are not on speaking terms and will not attend 
together. 

2.67. A person is dealt a poker hand (5 cards) from an ordinary deck with 52 cards. Find the number of ways 
the person can be dealt: (a) four of a kind, (b) a flush. 

2.68. A student must answer 10 out of 13 questions. (a) How many choices are there? (b) How many if the 


student must answer the first 2 questions? (c) How many if the student must answer the first or second 
question but not both? 
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PARTITIONS 

2.69. Find the number of ways 6 toys may be divided evenly among 3 children. 

2.70. Find the number of ways 6 students can be partitioned into 3 teams containing 2 students each. (Compare 
with Problem 2.69.) 

2.71. Find the number of ways 6 students can be partitioned into 2 teams where each team contains 2 or more 
students. 

2.72. Find the number of ways 9 toys may be divided among 4 children if the youngest is to receive 3 toys and 
each of the others 2 toys. 

2.73. There are 9 students in a class. Find the number of ways the students can take 3 tests if 3 students are 
to take each test. 

2.74. There are 9 students in a class. Find the number of ways the students can be partitioned into 3 teams 


containing 3 students each. (Compare with Problem 2.73.) 


TREE DIAGRAMS 


2.75. 


2.76. 


2.38. 


2.39. 


2.40. 


2.41. 


Teams A and B play in the world series of baseball where the team that first wins 4 games wins the 
series. Suppose A wins the first game and that the team that wins the second game also wins the fourth 
game. (a) Find the number n of ways the series can occur, and list the m ways the series can occur. 
(b) How many ways will B win the series? (c) How many ways will the series last 7 games? 

Suppose A, B, ..., F in Fig. 2-7 denote islands, and the lines connecting them bridges. A person begins 
at A and walks from island to island. The person stops for lunch when he or she cannot continue to walk 


without crossing the same bridge twice. (a) Construct the appropriate tree diagram, and find the number 
of ways the person can walk before eating lunch. (6) At which islands can he or she eat lunch? 


Fig. 2-7 


Answers to Supplementary Problems 
(a) 3,628,800; 39,916,800; 479,001,600. (b) log(60!) = 81.92, so 60! ~ 6.59 x 10°". 
(a) 240; (b) 2184; (c) 1/90; (d) 1/1716. 
(a) n +1; (b) n(n — 1) =n? =n; (c) Afn(n + 1)(n F 2) (A) (2-1 +1). 


(a) 10; (b) 35; (c) 91; (d) 15; (e) 1140; (f) 816. 
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2.42. Hint: Expand (a) (1 + 1)"; (6) (A - 1)”. 

2.43. (a) 60; (b) 210; (c) 504; (d) Not defined. 

2.44. (a) 1, 9, 36, 84, 126, 126, 84, 36, 9, 1; (b) 1, 10, 45, 120, 210, 252, 210, 120, 45, 10, 1. 
2.45. (a) 15; (b) 105. 

2.46. (a) 9; (b) 24. 

2.47. (a) 14; (b) 48; (c) 182. 

2.48. (a) 26+ 36°; (b) 5+ 36°. 

2.49. (a) 267+ 10° = 676,000; (b) 26 - 25 « 10° = 650,000; (c) 26 « 10° = 26,000. 
2.50. (a) 24; (b) 242 = 576; (c) 360. 

2.51. n=18-17-16 = 4896. 

2.52. (a) 6! = 720; (b) 3-5! = 360. 

2.53. (a) 6! = 720; (b) 2-3! +3! = 72: (c) 4-3! +3! = 144. 


9! 1! 8! 
2.54. (a) 30; (b) <5; = 45:360; (¢) sa,5, = 1,663,200; (d) == = 5040 


8! 


2.55. n= 412101 


= 420. 


2.56. 3!5!4!3! = 103,680. 

2.57. (a) 12° = 1728; (b) 1320. 

2.58. (a) 10* = 10,000; (b) 10-9-8+7 = 5040. 

2.59. C(6,2) = 15. 

2.60. C(8,3) = 56. 

2.61. (a) C(10, 2) = 45; (b) 6+ 4 = 24; (c) C(6,2) + C(4, 2) = 21 or 45 — 24 = 21. 
2.62. C(12,4) = 495. 


2.63. (a) C(9, 2) + C(3, 2) = 108; (b) C(9, 3) «3 = 252; 
(c) 9 + 108 + 252 = 369 or C(12, 4) — C(9,4) = 495 — 126 = 369. 


2.64. C(11,5) = 462. 
2.65. 210. 


2.66. 252. 
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2.67. 


2.68. 


2.69. 


2.70. 


2.71. 


2.72. 


2.73. 


2.74, 


2.75. 


(a) 13 + 48 = 624; (b) 4+ C(13, 5) = 5148. 

(a) C(13, 10) = C(13, 3) = 286; (b) 2+ C(11, 9) = 2+ C(11,2) = 110. 
90. 

15. 


(Hint. The number of subsets excluding @ and the 6 singleton subsets.) 2° — 1 — 6 = 25. 


9! 
= 7560. 
3121212! 
9! 
= 1680. 

31313! 
1680 
—— = 280. 


Construct the appropriate tree diagram as in Fig. 2-8. Note that the tree begins at A, the winner of the first 
game, and that there is only one choice in the fourth game, the winner of the second game. (a) The diagram 
shows that n = 15 and that the series can occur in the following 15 ways: 


AAAA, AABAA, AABABA, AABABBA, AABABBB, ABABAA, ABABABA, ABABABB, 
ABABBAA, ABABBAB, ABABBB, ABBBAAA, ABBBAAB, ABBBAB, ABBBB 


(b) 6; (c) 8. 
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Fig. 2-8 
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2.76. (a) See Fig. 2-9. There are 11 ways to take his walk. (b) B, D, or E. 


E 
a B 
F CG D 
eae é 
Fig. 2-9 
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[CHAP. 2 


Introduction to 
Probability 


3.1 INTRODUCTION 


Probability theory is a mathematical modeling of the phenomenon of chance or randomness. If 
a coin is tossed in the air, it can land heads or tails, but we do not know which of these will occur in 
a single toss. However, suppose we repeat this experiment of tossing a coin; let s be the number of 
successes, that is, that a head appears, and let n be the number of tosses. Then it has been empirically 
observed that the ratio f = s/n, called the relative frequency of the outcome, becomes stable in the long 
run, that is, the ratio f = s/n approaches a limit. If the coin is perfectly balanced, then we expect that 
the coin will land heads approximately 50 percent of the time or, in other words, the relative frequency 
will approach 1/2. Alternately, assuming the coin is perfectly balanced, we can arrive at the value 1/2 
deductively. That is, one side of the coin is as likely to occur as the other; hence the chances of getting 
a head is one in two which means the probability of getting a head is 1/2. Although the specific 
outcome on any one toss is unknown, the behavior over the long run is determined. This stable 
long-run behavior of random phenomena forms the basis of probability theory. 

Consider another experiment, the tossing of a six-sided die (Fig. 3-1) and observing the number 
of dots, or pips, that appear on the top face. Suppose the experiment is repeated n times and let s be 
the number of times 4 dots appear on top. Again, as n increases, the relative frequency f = s/n of the 
outcome 4 becomes more stable. Assuming the die is perfectly balanced, we would expect that the 


eo? 9, 


Fig. 3-1 
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stable or long-run value of this ratio is 1/6, and we say the probability of getting a4 is 1/6. Alternately, 
we can arrive at the value 1/6 deductively. That is, with a perfectly balanced die, any one side of the 
die is as likely as any other to occur on top. Thus, the chances of getting a 4 is one in six or, in other 
words, the probability of getting a 4 is 1/6. Again, although the specific outcome on any one toss is 
unknown, the behavior over the long run is determined. 

The historical development of probability theory is similar to the above discussion. ‘That is, 
letting E denote the outcome of an experiment, called an event, there were two ways to obtain the 
probability p of E: 


(a) Classical (A Priori) Definition: Suppose an event E can occur in s ways out of a total of n 
equally likely possible ways. Then p = s/n. 


(b) Frequency (A Posteriori) Definition: Suppose after n repetitions, where n is very large, an event 
E occurs s times. Then p = s/n. 


Both of the above definitions have serious flaws. The classical definition is essentially circular since 
the idea of “equally likely” is the same as that of “with equal probability” which has not been 
defined. The frequency definition is not well defined since ‘“‘very large” has not been defined. 

The modern treatment of probability theory is axiomatic using set theory. Specifically, a 
mathematical model of an experiment is obtained by arbitrarily assigning probabilities to all the 
events, except that the assignments must satisfy certain axioms listed below. Naturally, the reliability 
of our mathematical model for a given experiment depends upon the closeness of the assigned 
probabilities to the actual limiting relative frequencies. This then gives rise to problems of testing and 
reliability, which form the subject matter of statistics. 


3.2 SAMPLE SPACE AND EVENTS 


The set S of all possible outcomes of some experiment is called the sample space. A particular 
outcome, that is, an element of S, is called a sample point. An event A is a set of outcomes or, in other 
words, a subset of the sample space S. The event {a} consisting of a single point a € S is called an 
elementary event. The empty set M and S are subsets of S and hence they are events; O is sometimes 
called the impossible or null event, and S is sometimes called the certain or sure event. 

Events can be combined to form new events using the various set operations: 

(i) AU Bis the event that occurs iff A occurs or B occurs (or both). 
(ii) AM B is the event that occurs iff A occurs and B occurs. 
(iii) A‘, the complement of A, is the event that occurs iff A does not occur. 
(Here “iff” is an abbreviation of “if and only if’’.) 
Events A and B are called mutually exclusive if they are disjoint, that is, if AN B=@. In other 


words, A and B are mutually exclusive if they cannot occur simultaneously. Three or more events are 
mutually exclusive if every two of them are mutually exclusive. 


EXAMPLE 3.1 


(a) Experiment: Toss a die and observe the number (of dots) that appears on top face. 
The sample space S consists of the six possible numbers, that is, 


S = {1, 2, 3, 4, 5, 6} 


Let A be the event that an even number occurs, B that an odd number occurs, and C that a number greater 
than 3 occurs, that is, let 


A = {2, 4, 6}, B = {1,3, 5}, C = {4,5, 6} 
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Then: 


AUC = {2,4,5, 6} = the event that an even number or a number exceeding 3 occurs 
ANC = {4, 6} = the event that an even number and a number exceeding 3 occurs 


C* = {1, 2,3} = the event that a number exceeding 3 does not occur. 


Note that A and B are mutually exclusive, that is, that AM B=. In other words, an even number and 
an odd number cannot occur simultaneously. 


(b) Experiment: Toss a coin three times and observe the sequence of heads (H) and tails (7) that appears. 
The sample space S consists of the following eight elements: 


S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


Let A be the event that two or more heads appear consecutively, and B that all the tosses are the same, 
that is, 


A = (HHH, HHT, THH} and B = {HHH, TTT} 


Then AM B = {HHH} is the elementary event in which only heads appear. The event that five heads 
appear is the empty set ©. 


(c) Experiment: Toss a coin until a head appears, and then count the number of times the coin is tossed. 

The sample space of this experiment is S = {1,2,3,..., 2}. Here © refers to the case when a head never 
appears, and so the coin is tossed an infinite number of times. Since every positive integer is an element 
of S, the sample space is infinite. In fact, this is an example of a sample space which is countably 
infinite. 

(d) Experiment: Let a pencil drop, head first, into a rectangular box and note the point at the bottom of the box 
that the pencil first touches. Here S consists of all the points on the bottom of the box. Let the rectangular 
area in Fig. 3-2 represent these points. Let A and B be the events that the pencil drops into the 
corresponding areas illustrated in Fig. 3-2. Then AB is the event that the pencil drops in the shaded 
region in Fig. 3-2. 


Fig. 3-2 


Remark: The sample space S in Example 3.1(d) is an example of a continuous sample 
space. (A sample space S is continuous if it is an interval or a product of intervals.) In such a case, 
only special subsets (called measurable sets) will be events. On the other hand, if the sample space 
S is discrete, that is, if S is finite or countably infinite, then every subset of S is an event. 


EXAMPLE 3.2 Toss of a pair of dice A pair of dice is tossed and the two numbers appearing on the top faces 
are recorded. There are six possible numbers, 1, 2,...,6, on each die. Thus, S consists of the pairs of numbers 
from 1 to 6, and hence n(S) = 6-6 = 36. Figure 3-3 shows these 36 pairs of numbers arranged in an array where 
the rows are labeled by the first die and the columns by the second die. Let A be the event that the sum of the 
two numbers is 6, and let B be the event that the largest of the two numbers is 4. That is, let 


A = {(1,5), (2,4), G, 3), (4,2), G. D} 
B= {(1,4), (2,4), 3,4), (4,4), (4,3), 4,2), (4 D} 
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Second die 


ay ala) a3) (3) @ (16) 


(2.1) (2,2) (2,3) (2:5) (2:6) 
(3.1) (3,2) (3) (3.4) (3.5) (3,6) 


fe y, 
i 4/3) lay} 4s) (456) 
ye 
B 
(65.3) (5,4) (5.5) 5,6) 
(6.1) 6,2) (6.3) 4) (65) (6.6) 


Fig. 3-3 


These events are pictured in Fig. 3-3. Then the event “A and B” consists of those pairs of integers whose sum 
is 6 and whose largest number is 4 or, in other words, the intersection of A and B. Thus 


AN B= {(2,4), (4,2)} 
Similarly, “A or B’’, the sum is 6 or the largest is 4, is the union A U B, and “not A”, the sum is not 6, is the 


complement A‘°. 


EXAMPLE 3.3 Deck of cards A card is drawn from an ordinary deck of 52 cards which is pictured in 
Fig. 3-4(a). The sample space S consists of the four suits, clubs (C), diamonds (D), hearts (H), and spades (5), 


4 Suits 
Diamonds Hearts Spades Cc D 


Black 2 2 Black 


Fig. 3-4 
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where each suit contains 13 cards which are numbered 2 to 10, and jack (J), queen (Q), king (K), and ace 
(A). The hearts (H) and diamonds (D) are red cards, and the spades (S) and clubs (C) are black cards. Figure 
3-4(b) pictures 52 points which represent the deck S of cards in the obvious way. Let E be the event of a picture 
card, that is, a jack (J), queen (Q), or king (K), and let F be the event of a heart. Then 


E 0 F = (JH, QH, KH} 


is the event of a heart and a picture card, as shaded in Fig. 3-4(b). 


3.3 AXIOMS OF PROBABILITY 


Let S be a sample space, let @ be the class of all events, and let P be a real-valued function defined 
on @. Then Pis called a probability function, and P(A) is called the probability of the event A, when 
the following axioms hold: 

[P,] For any event A, we have P(A) = 0. 

[P2] For the certain event S, we have P(S) = 1. 

[P;] For any two disjoint events A and B, we have 


P(A UB) = P(A) + P(B) 
[P;] For any infinite sequence of mutually disjoint events A,, Az, A3,..., we have 
P(A, UA, U A; U--:) = P(A,) + P(A2) + P(A3) + °°: 


Furthermore, when P does satisfy the above axioms, the sample space S will be called a probability 
space. 

The first axiom states that the probability of any event is nonnegative, and the second axiom states 
that the certain or sure event S has probability 1. The next remarks concern the two axioms [P3] and 
[P3]. The axiom [P3] formalizes the natural assumption that if A and B are two disjoint events, then 
the probability of either of them occurring is the sum of their individual probabilities. Using 
mathematical induction, we can then extend this additive property for two sets to any finite number of 
disjoint events, that is, for any mutually disjoint sets A;, A2,..., An, we have 


P(A, U Az U+++UA,) = P(A;) + P(A) +++» + P(A,) (*) 


We emphasize that [P3] does not follow from [P3], even though (*) is true for every positive integer 
n. However, if the sample space S is finite, then only [P3] is needed, that is, [P3] is superfluous. 


Theorems on Probability Spaces 


The following theorems follow directly from our axioms, and will be proved here. We use| | to 
indicate the end of a proof. 


Theorem 3.1: The impossible event or, in other words, the empty set M has probability zero, that is, 


P(O) = 0. 

Proof: For any event A, we have AU @ = A where A and @ are disjoint. By [Ps], 
P(A) = P(A U®) = P(A) + P() 

Adding — P(A) to both sides gives P(@) = 0. 


The next theorem, called the complement rule, formalizes our intuition that if we hit a target, say, 
p = 1/3 of the times, then we miss the target g = 1 — p = 2/3 of the times. [Recall that A° denotes the 
complement of the set A.] 
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Theorem 3.2 (Complement Rule): For any event A, we have 
P(A‘) =1-— P(A) 
Proof: S =A UA‘ where A and A‘ are disjoint. By [P.], P(S) =1. Thus, by [Ps], 
1 = P(S) = P(A U A‘) = P(A) + P(A) 
Adding — P(A) to both sides gives us P(A‘) = 1 — P(A). 


The next theorem tells us that the probability of any event must lie between 0 and 1. That is, 
Theorem 3.3: For any event A, we have 0 S$ P(A) <1. 


Proof: By [P,], P(A) =0. Hence we need only show that P(A) =1. Since S = A U A‘ where 
A and A‘ are disjoint, we get 


1 = P(S) = P(A U A‘) = P(A) + P(A‘) 


Adding —P(A‘) to both sides gives us P(A) =1— P(A‘). Since P(A‘) =0, we get P(A) <1, as 
required. 


The following theorem applies to the case that one event is a subset of another event. 
Theorem 3.4: If A CB, then P(A) = P(B). 


Proof: If ACB, then, as indicated by Fig. 3-5(a), B= AU(B\A) where A and B\A are 
disjoint. Hence 


P(B) = P(A) + P(B\A) 
By [P.], we have P(B\ A) = 0; hence P(A) = P(B). 


@) 


B\A 


B A B A B 
(a) B is shaded. (b) A is shaded. (c) AUB is shaded. 


Fig. 3-5 


The following theorem concerns two arbitrary events. 


Theorem 3.5: For any two events A and B, we have 


P(A\ B) = P(A) — (ANB) 


Proof: As indicated by Fig. 3-5(b), A =(A\B)U(ANMB) where A\B and ANB are dis- 
joint. Accordingly, by [Ps], 


P(A) = P(A\B) + P(ANB) 


from which our result follows. 


The next theorem, called the general addition rule, or simply addition rule, is similar to the 
inclusion-exclusion principle for sets. 
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Theorem (Addition Rule) 3.6: For any two events A and B, 
P(A UB) = P(A) + P(B) — P(ANB) 
Proof: As indicated by Fig. 3-5(c), AUB=(A\B)UB where A\B and B are disjoint 


sets. Thus, using Theorem 3.5, 
P(A U B) = P(A\B) + P(B) = P(A) — P(ANB) + P(B) 
= P(A) + P(B) — P(ANB) 


which is our result. 


Applying the above theorem twice (Problem 3.34), we obtain: 
Corollary 3.7: For any events, A, B, C, we have 
P(AU BUC) = P(A) + P(B) + P(C) — PAN B)-— P(ANC)-— PIBNC)+PANBNC) 


Clearly, like the analogous inclusion-exclusion principle for sets, the addition rule can be extended 
by induction to any finite number of sets. 


3.4 FINITE PROBABILITY SPACES 


Consider a finite sample space S where we assume, unless otherwise stated, that the class @ of all 
events consists of all subsets of S. As noted above, S becomes a probability space by assigning 
probabilities to the events in @ so they satisfy the probability axioms. This section shows how this 
is usually done when the sample space S is finite. The next section discusses infinite sample spaces. 


Finite Equiprobable Spaces 


Suppose S is a finite sample space, and suppose the physical characteristics of the experiment 
suggest that the various outcomes of the experiment be assigned equal probabilities. Such a 
probability space S, where each point is assigned the same probability, is called a finite equiprobable 
space. Specifically, if S has n elements, then each point in S is assigned the probability 1/n and each 
event A containing r points is assigned the probability 7/n. In other words, 


_ number of elements in A _ n(A) 


P(A) 


number of elements in S$ —n(S) 
or 


number of ways that the event A can occur 


P(A) 


number of ways that the sample space S' can occur 


We emphasize that the above formula for P(A) can only be used with respect to an equiprobable 
space, and cannot be used in general. 


We state the above result formally. 


Theorem 3.8: Let S be a finite sample space and, for any subset A of S, let P(A) = n(A)/n(S). Then 
P satisfies axioms [P,], [P2], and [P3]. 


The expression “at random” will be used only with respect to an equiprobable space; formally, the 
statement “‘choose a point at random from a set S” shall mean that S is an equiprobable space where 
each point in S has the same probability. 
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EXAMPLE 3.4 A card is selected at random from an ordinary deck of 52 playing cards. (See Fig. 3-4.) Consider 
the following events [where a face card is a jack (J), queen (Q), or king (K)]: 


A = {heart} and B = {face card} 
(a) Find P(A), P(B), and P(AM B). (b) Find P(A U B). 
(a) Since we have an equiprobable space, 


number of hearts 13 1 P(B) = number of face cards 122. 3 
number of cards 52 4’ 


P(A) = 


number of cards 52 13° 


number of heart face cards 3 
P(AN B)= = 


number of cards 52 


(b) Since we want P(A U B), the probability that the card is a heart or a face card, we can count the number 
of such cards and use Theorem 3.8. Alternately, we can use (a) and the Addition Rule Theorem 3.6 to 
obtain 


re ae ae 
P(A UB) = P(A) + P(B) — P(ANB 
(A U B) = P(A) + P(B) — PC J=47B 527 527 26 


EXAMPLE 3.5 Suppose a student is selected at random from 80 students where 30 are taking mathematics, 20 
are taking chemistry, and 10 are taking mathematics and chemistry. Find the probability p that the student is 
taking mathematics (M) or chemistry (C). 


Since the space is equiprobable, we have: 


Ps = pas = 5 eae Pcouring= == 
80. 8" a gS ee ~ 80 8 
Thus, by the Addition Rule (Theorem 3.6), 
3 1 1 =41 
p = P(M or C) = PMU C) = P(M) + P(C)— PM NC) =e +7 Bas 


Finite Probability Spaces 


Let S be a finite sample space, say S = {a,,a2,...,a,}. A finite probability space, or finite 
probability model, is obtained by assigning to each point a; in S a real number p;, called the probability 
of a;, satisfying the following properties: 

(i) Each p; is nonnegative, that is, p; = 0. 
(ii) The sum of the p; is 1, that is, 


S)pi= Pit pot-+++pp=1 
The probability P(A) of an event A is defined as the sum of the probabilities of the points in A, 


that is, 
P(A) = >) P(a) = >) Pi 


ajc A ajEA 
For notational convenience, we write P(a;) instead of P({a;}). 
Sometimes the points in a finite sample space S and their assigned probabilities are given in the 
form of a table as follows: 


Outcome | a a **: 4, 


Probability | Pi: Pz *** Pn 


Such a table is called a probability distribution. 
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The fact that P(A), the sum of the probabilities of the points in A, does define a probability space 
is stated formally below (and proved in Problem 3.32). 


Theorem 3.9: The above function P(A) satisfies the axioms 
[P,], [P2], and [Ps]. 
EXAMPLE 3.6 Experiment Let three coins be tossed and the number of heads observed. [Compare with 


Example 3.1(b).] Then the sample space is S = {0,1,2,3}. The following assignments on the elements of S 
define a probability space: 


Outcome | 0 1 2 3 


Probability | 1/8 3/8 3/8 1/8 


That is, each probability is nonnegative, and the sum of the probabilities is 1. Let A be the event that at least 
one head appears, and let B be the event that all heads or all tails appear, that is, let 


A = {1, 2, 3} and B = {0, 3} 
Then, by definition, 


P(A) = P(1) + P(2) + P(3) : Po4 


and 


P(B) = P(0) + P() . te 


EXAMPLE 3.7 Three horses A, B, C are in a race; A is twice as likely to win as B, and B is twice as likely to 
win as C. 

(a) Find their respective probabilities of winning, that is, find P(A), P(B), P(C). 

(b) Find the probability that B or C wins. 


(a) Let P(C) =p. Since B is twice as likely to win as C, P(B) = 2p; and since A is twice as likely to win as B, 
P(A) = 2P(B) = 2(2p) = 4p. Now the sum of the probabilities must be 1; hence 


pt+2p+4p=1 or pH=l or p=e 
1 


4 2 
Accordingly, P(A) = 4p = = P(B) = 2p ==, P(C)=p= = 


7 7 
(b) Note {B,C} is the event that B or C wins, so we want P({B, C}). By definition, we simply add up the 
probabilities of the points in {B,C}. Thus 


P({B, C}) = P(B) + P(C) 7 : 2 


3.5 INFINITE SAMPLE SPACES 


This section considers infinite sample spaces S._ There are two cases, the case where S is countably 
infinite and the case where S is uncountable. We note that a finite or a countably infinite probability 
space S is said to be discrete, whereas an uncountable space S is said to be nondiscrete. Moreover, 
an uncountable space S which consists of a continuum of points, such as an interval or product of 
intervals, is said to be continuous. 
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Countably Infinite Sample Spaces 
Suppose S is a countably infinite sample space; say 
S = (a1, do, a3,...} 


Then, as in the finite case, we obtain a probability space by assigning each a; € S a real number p;, 
called its probability, such that: 


(i) Each p; is nonnegative, that is, p; = 0. 
(ii) The sum of the p; is equal to 1, that is, 


Pitpot+pst+-+= >) p=1 
i=1 


The probability P(A) of an event A is then the sum of the probabilities of its points. 


EXAMPLE 3.8 Consider the sample space S = {1,2,3,...,%} of the experiment of tossing a coin until a head 
appears; here n denotes the number of times the coin is tossed. A probability space is obtained by setting 


1 1 1 1 
p(l) = 57 P(2) = 7 PB) = +++ P(n) = aa 


» P(~) = 0 
Consider the events: 
A = {n is at most 3} = {1, 2, 3} and B = {n is even} = {2, 4,6, ...} 


Find P(A) and P(B). 
Adding the probabilities of the points in the sets (events) yields: 


1 1 1 7 
P(A) = P(1,2,3)==+—+— == 
1 1 1 
P(B) = P2,4,6,8,..J=7tatat 


Note that P(B) is a geometric series with a = 1/4 and r = 1/4; hence 
a 1/4 1 


l-r 34 3 


P(B) = 


Uncountable Spaces 


The only uncountable sample spaces S which we will consider here are those with some finite 
geometrical measurement m(S), such as length, area, or volume, and where a point in S is selected at 
random. The probability of an event A, that is, that the selected point belongs to A, is then the ratio 
of m(A) to m(S). Thus 

length of A area of A volume of A 


P(A) = ——— __ or P(A) = ——— or P(A) = 


length of S area of S$ volume of S$ 


Such a probability space S is said to be uniform. 


EXAMPLE 3.9 A point is chosen at random inside a rectangle measuring 3 by Sin. Find the probability p that 
the point is at least 1 in from the edge. 

Let S denote the set of points inside the rectangle and let A denote the set of points at least 1 in from the 
edge. Sand A are pictured in Fig. 3-6. Note that A is a rectangular area measuring 1in by 3in. Thus 


areaofA 1:3 1 
areaofS 3°5 5 


p= 
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: 


Fig. 3-6. A is shaded. 


3.6 CLASSICAL BIRTHDAY PROBLEM 


69 


The classical birthday problem concerns the probability that n people have distinct birthdays 
where n = 365. Here we ignore leap years and assume that a person’s birthday can fall on any day 


with equal probability. 


Since there are n people and 365 different days, there are 365” ways in which the n people can have 


their birthdays. On the other hand, if the 1 persons are to have distinct birthdays, then: 
(i) The first person can be born on any of the 365 days. 
(ii) The second person can be born on the remaining 364 days. 
(iii) The third person can be born on the remaining 363 days, and so on. 
Thus there are: 
365 + 364 + 363 --- (365 —n + 1) 
ways that 1 persons can have distinct birthdays. Therefore 


365 + 364 - 363 --- (365 —n + 1) 
365" 


P(n people have distinct birthdays) = 


Accordingly, the probability p that two or more people have the same birthday is as follows: 


p =1-— [probability that no two people have the same birthday] 


365 + 364 + 363 --- (365 —n + 1) 
365" 


The value of p where v is a multiple of 10 up to 60 follows: 


n | 10 20 30 40 50 60 


Dp | 0.117 0.411 0.706 0.891 0.970 0.994 
We note that p = 0.476 for n = 22 and that p = 0.507 for n = 23. Accordingly: 


In a group of 23 people, it is more likely 


that at least two of them have the same birthday 
than that they all have distinct birthdays. 


The above table also tells us that, in a group of 60 or more people, the probability that two or more 


of them have the same birthday exceeds 99 percent. 
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Solved Problems 


SAMPLE SPACES AND EVENTS 
3.1. Let A and B be events. Find an expression and exhibit the Venn diagram for the event: 
(a) A but not B, (b) neither A nor B, (c) either A or B, but not both. 


(a) Since A but not B occurs, shade the area of A outside of B, as in Fig. 3-7(a). Note that B‘, the 
complement of B, occurs, since B does not occur; hence A and B‘ occur. In other words, the event 
is AN B. 

(b) “Neither A nor B” means “not A and not B” or A°N B*. By DeMorgan’s law, this is also the set 
(A U B)% hence shade the area outside of A and outside of B, that is, outside AU B, as in 
Fig. 3-7(b). 

(c) Since A or B, but not both, occurs, shade the area of A and B, except where they intersect, as 
in Fig. 3-7(c). The event is equivalent to the occurrence of A but not B or B but not A. Thus, 
the event is (AM BY) U(BN A‘). Alternately, the event is A ® B, the symmetric difference of A 


and B. 
(a) A but not B. (b) Neither A nor B. (c) A or B, but not both. 


Fig. 3-7 


3.2. Let A, B, C be events. Find an expression and exhibit the Venn diagram for the event: 
(a) A and B but not C occurs, (b) only A occurs. 


(a) Since A and B but not C occurs, shade the intersection of A and B which lies outside of C, as in 
Fig. 3-8(a). The event consists of the elements in A, in B, and in C* (not in C), that is, the event is 
the intersection AN BN C*. 


ox 
(a) A and B but not C occurs. (b) Only A occurs. 


Fig. 3-8 


(b) Since only A is to occur, shade the area of A which lies outside of B and C, as in Fig. 3-8(b). The 
event consists of the elements in A, in B* (not in B), and in C* (not in C), that is, the event is the 
intersection AN BSN C*. 
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3.3. 


3.4. 


3.5. 


Let a coin and a die be tossed; and let the sample space S consist of the 12 elements: 
S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6} 
Express explicitly the following events: 


(a) A = {heads and an even number}, (b) B = {a number less than 3}, 
(c) C = {tails and an odd number}. 


(a) The elements of A are those elements of S which consist of an H and an even number; hence 
A = {H2, H4, H6} 


(b) The elements of B are those elements of S whose second component is less than 3, that is, 1 or 2; 
hence 


B = {H1, H2, T1, T2} 
(c) The elements of C are those elements of S which consist of a T and an odd number; hence 


C = {T1, T3, T5} 


Consider the events A, B, C in the preceding Problem 3.3. Express explicitly the event that: 
(a) A or B occurs. (b) Band C occur. (c) Only B occurs. 
Which pair of the events A, B, C are mutually exclusive? 
(a) ‘A or B” is the set union A U B; hence 
AUB = {H2, H4, H6, H1, T1, T2} 
(b) “Band C” is the set intersection BM C; hence 
BOC={Tl} 


(c) ‘Only B” consists of the elements of B which are not in A and not in B, that is, the set intersection 
BN ASN C*%; hence 


BOAO C = {A1, T2} 
Only A and C are mutually exclusive, that is, AN C = ©. 


A pair of dice is tossed and the two numbers appearing on the top are recorded. Recall that 
S consists of 36 pairs of numbers which are pictured in Fig. 3-3. Find the number of elements 
in each of the following events: 
(a) A = {two numbers are equal} (c) C= {5 appears on first die} 
(b) B= {sum is 10 or more} (d) D = {5 appears on at least one die} 
Use Fig. 3-3 to help count the number of elements in each of the events: 
(a) A={(1,1), (2,2), ..., (6,6)}, so n(A) = 6. 
(b) B= {(6,4), (5,5), (4,6), (6,5), (5, 6), (6, 6)}, so n(B) = 6. 
(c) C= {(5, 1), (5,2), ..., (5,6), so n(C) = 6. 


(d) There are six pairs with 5 as the first element, and six pairs with 5 as the second element. However, 
(5,5) appears in both places. Hence 


n(D) =6+6-1=11 
Alternately, count the pairs in Fig. 3-3 which are in D to get n(D) = 11. 


72 


INTRODUCTION TO PROBABILITY [CHAP. 3 


FINITE EQUIPROBABLE SPACES 


3.6. 


3.7. 


3.8. 


3.9. 


Determine the probability p of each event: 
(a) An even number appears in the toss of a fair die. 
(b) At least one tail appears in the toss of 3 fair coins. 


(c) A white marble appears in the random drawing of 1 marble from a box containing 4 white, 
3 red, and 5 blue marbles. 


Each sample space S is an equiprobable space. Hence, for each event EF, use 


number of elements in E _ n(E) 


P(E) = : 
number of elements in S —_n(S) 


(a) The event can occur in three ways (a 2, 4, or 6) out of 6 equally like cases; hence p = 3/6 = 1/2. 
(b) Assuming the coins are distinguished, there are 8 cases: 
HHH, HHT, HTH, HTT, THH, THT, TTH, TTT 
Only the first case is not favorable; hence p = 7/8. 
(c) There are 4+3+5 = 12 marbles of which 4 are white; hence p = 4/12 = 1/3. 


A single card is drawn from an ordinary deck S of 52 cards. (See Fig. 3-4.) Find the probability 
p that the card is a: (a) king, (b) face card (jack, queen, or king), (c) red card (heart or 
diamond), (d) red face card. 


Here n(S) = 52. 
(a) There are 4 kings; hence p = 4/52 = 1/13. 
(b) There are 4(3) = 12 face cards; hence p = 12/52 = 3/13. 
(c) There are 13 hearts and 13 diamonds; hence p = 26/52 = 1/2. 
(d) There are 6 face cards which are red; hence p = 6/52 = 3/26. 


Consider the sample space S and events A, B, C in Problem 3.3 where a coin and a die are 
tossed. Suppose the coin and die are fair; hence S is an equiprobable space. Find: 
(a) P(A), P(B), P(O), (b) P(AUB), P(BNC), P(BNASNC) 

Since S is an equiprobable space, use P(E) = n(E)/n(S). Here n(S) = 12. We need only count the 
number of elements in each given set, and then divide by 12. 
(a) By Problem 3.3, P(A) = 3/12, P(B) = 4/12, P(C) = 3/12. 
(b) By Problem 3.4, P(A U B) = 6/12, P(BN C) = 1/12, P(BN ACN C) = 2/12. 


A box contains 15 billiard balls which are numbered from 1 to 15. A ball is drawn at random 
and the number recorded. Find the probability p that the number is: 


(a) even, (b) less than 5, (c) even and less than 5, (d) even or less than 5. 
(a) There are 7 numbers, 2, 4, 6, 8, 10, 12, 14, which are even; hence p = 7/15. 
(b) There are 4 numbers, 1, 2, 3, 4, which are less than 5, hence p = 4/15. 


(c) There are 2 numbers, 2 and 4, which are even and less than 5; hence p = 2/15. 
(d) By the addition rule (Theorem 3.6), 


Alternately, there are 9 numbers, 1, 2, 3, 4, 6, 8, 10, 12, 14, which are even or less than 5; hence 
p = 9/15. 
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3.10. 


3.11. 


3.12. 


3.13. 


3.14. 


A box contains 2 white sox and 2 blue sox. Two sox are drawn at random. Find the 
probability p they are a match (same color). 


4 
There are C(4, 2) = (3) = 6 ways to draw 2 of the sox. Only two pairs will yield a match. Thus 


p = 2/6 = 1/3. 


Five horses are in arace. Audrey picks 2 of the horses at random and bets on them. Find the 
probability p that Audrey picked the winner. 
5 
There are C(5,2) = (3) = 10 ways to pick 2 of the horses. Four of the pairs will contain the 


winner. Thus, p = 4/10 = 2/5. 


A class contains 10 men and 20 women of which half the men and half the women have brown 
eyes. Find the probability p that a person chosen at random is a man or has brown eyes. 


Let A = {men}, B = {brown eyes}. We seek P(A UB). First find: 


pay= 9 21! peye 8 1 Sud =2 a) 
30. 3? 30. 2’ 30. 6 
Thus, by the addition rule (Theorem 3.6), 
der AY 1. 2 
P(A UB) = P(A) + P(B)— P(ANB)=3 45-2535 


Six married people are standing in a room. Two people are chosen at random. Find the 
probability p that: (a) they are married; (b) one is male and one is female. 


There are C(12,2) = 66 ways to choose 2 people from the 12 people. 


(a) There are 6 married couples; hence p = 6/66 = 1/11. 


(b) There are 6 ways to choose the male and 6 ways to choose the female; hence p = (6: 6)/66 = 
36/66 = 6/11. 


Suppose 5 marbles are placed in 5 boxes at random. Find the probability p that exactly 1 of 
the boxes is empty. 


There are exactly 5° ways to place the 5 marbles in the 5 boxes. If exactly 1 box is empty, then 1 box 
contains 2 marbles and each of the remaining boxes contains 1 marble. There are 5 ways to select the 
empty box, then 4 way to select the box containing 2 marbles, and C(5, 2) = 10 ways to select 2 marbles 
to go into this box. Finally, there are 3! ways to distribute the remaining 3 marbles among the remaining 
3 boxes. Thus 


_5+4+10-3! 48 
= 125 


P 
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Two cards are drawn at random from an ordinary deck of 52 cards. (See Fig. 3-4.) Find the 
probability p that: (a) both are hearts, (b) one is a heart and one is a spade. 


There are C(52,2) = 1326 ways to choose 2 cards from the 52-card deck. In other words, 
n(S) = 1326. 


(a) There are C(13,2) = 78 ways to draw 2 hearts from the 13 hearts; hence 


number of ways 2 hearts can be drawn 78 3 


aa number of ways 2 cards can be drawn ~ 1326 51 


(b) There are 13 hearts and 13 spades, so there are 13-13 = 169 ways to draw a heart and a spade. 
Thus, p = 169/1326 = 13/102. 


FINITE PROBABILITY SPACES 


3.16. 


3.17. 


3.18. 


A sample space S$ consists of four elements, that is, S = {a,, a2, 43, a4}. Under which of the 

following functions P does S become a probability space? 

(a) P(a,)=0.4, P(a,)=0.3, P(a3)=0.2, P(a,) = 0.3. 

(b) P(a,) =0.4, P(a) = —0.2, P(a3)=0.7, P(a4) = 0.1. 

(c) P(a) =0.4, P(a.)=0.2, P(a3)=0.1, P(ay) = 0.3. 

(d) P(a,) =0.4, P(a.)=0, P(a3)=0.5, P(a4) = 0.1. 

(a) The sum of the values on the points in S exceeds one; hence P does not define S to be a probability 
space. 

(b) Since P(az) is negative, P does not define S to be a probability space. 

(c) Each value is nonnegative and their sum is one; hence P does define S to be a probability space. 


(d) Although P(az) = 0, each value is still nonnegative and their sum does equal. Thus, P does define 
S to be a probability space. 


A coin is weighted so that heads is twice as likely to appear as tails. Find P(7) and P(H). 


Let P(T) = p; then P(H) = 2p. Now set the sum of the probabilities equal to one, that is, set 
p+2p=1. Thenp=1/3. Thus P(H) = 1/3 and P(B) = 2/3. 


Suppose A and B are events with P(A) = 0.6, P(B) = 0.3, and P(A MB) =0.2. Find the 
probability that: 


(a) A does not occur. (c) A or B occurs. 
(b) B does not occur. (d) Neither A nor B occurs. 


(a) By the complement rule, P(not A) = P(A‘) = 1 — P(A) = 0.4. 
(b) By the complement rule, P(not B) = P(B°) = 1 — P(B) = 0.7. 
(c) By the addition rule, 


P(A or B) = P(A UB) = P(A) + P(B) — P(ANB) 
0.6 + 0.3 — 0.2 = 0.7 
(d) Recall [Fig. 3-7(b)] that neither A nor B is the complement of AUB. Therefore 


P(neither A nor B) = P(A U B)*) =1— P(AU B)=1-0.7=033 
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3.19. A die is weighted so that the outcomes produce the following probability distribution: 


3.20. 


3.21. 


3.22. 


Outcome 1 2 3 4 5 6 


Probability | 0.1 0.3 0.2 0.1 0.1 0.2 


Consider the events: 
A = {even number}, B = {2,3,4,5}, C={x:x<3}, D={x:x>7} 
Find the following probabilities: 
(a) P(A) (b) PIB) =) PCC) (a):s«P(D) 


For any event E, find P(E) by summing the probabilities of the elements in E. 
(a) A= {2,4,6}, so P(A) =0.3 + 0.1 + 0.2 = 0.6. 
(b) P(B) =0.3 + 0.2 + 0.1 + 0.1 = 0.7. 
(c) C= {1,2}, so P(C) = 0.1 + 0.3 = 0.4. 
(d) D=S, the empty set. Hence P(D) = 0. 


For the data in Problem 3.19, find: (a) P(A M B), (b) P(A UC), (c) PBN C). 
First find the elements in the event, and then add the probabilities of the elements. 

(a) ANB= {2,4}, so PCAN B)=03 +01 = 04. 

(b) AUC = {1,2,3,4,5} = {6}, so PAU C) =1-02=08. 

(c) BNC= {2}, so P(BNC) =0.3. 


Let A and B be events such that P(A U B) = 0.8, P(A) = 0.4, and P(A NM B) = 0.3. Find: 
(a) P(A); (6) P(B); (©) PCAN BY); (d) PAS B®). 
(a) By the complement rule, P(A‘) = 1 — P(A) =1-—0.4 = 0.6. 
(b) By the addition rule, P(A U B) = P(A) + P(B) — P(ANB). Substitute in this formula to obtain: 
0.8=0.4+P(B) +03 or P(B)=01 
(c) P(AN B®) = P(A\B) = P(A) — (ANB) = 04-03 = 0.1. 
(d) By DeMorgan’s law, (A U B)‘ = A°M BS. Thus 
P(A‘ NB’) = P((AU BY) = 1— P(AU B) = 1-08 =02 


Suppose S = {a;, dz, a3, a4}, and suppose P is a probability function defined on S. 
(a) Find P(a,) if P(az) = 0.4, P(a3) = 0.2, P(a3) = 0.1. 
(b) Find P(a,) and P(a,) if P(a3) = P(a4) = 0.2 and P(a,) = 3P(ap). 


(a) Let P(a,) =p. For P to bea probability function, the sum of the probabilities on the sample points 
must equal one. Thus, we have 


p+044+02+01=1 or p=0.3 
(b) Let P(a,) = p so P(a,) = 3p. Thus 


3p+pt+02+02=1 or p=015 
Hence P(a,) = 0.15 and P(a,) = 0.45. 
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ODDS 


3.23. Suppose P(E) =p. The odds that E occurs is defined to be the ratio p:(1—p). Find p if the 
odds that E occurs are a to b. 


Set the ratio p: (1 — p) to a: b to obtain 


or bp =a-ap or ap + bp=a or p= 


3.24. The odds that an event E occurs is 3 to 2. Find the probability of E. 
Let p = P(E). Set the odds equal to p: (1 — p) to obtain 


Dp 3 3 
— == or 2p =3-—3p or 5p =3 or p=- 
1-p 2 
Alternately, use the formula in Problem 3.21 to directly obtain 


a 3 _3 
a+b 34+2 5 


p= 


3.25. Suppose P(E) = 5/12. Express the odds that EF occurs in terms of positive integers. 
First compute 1 — P(E) = 7/12. The odds that E occurs are 


P(E) _ 525 
1-P(E) 72 7 


Thus, the odds are 5 to 7. 


UNCOUNTABLE UNIFORM SPACES 


3.26. A point is chosen at random inside a circle. Find the probability p that the point is closer to 
the center of the circle than to its circumference. 
Let S denote the set of points inside the circle with radius r, and let A denote the set of points inside 


the concentric circle with radius 47, as pictured in Fig. 3-9(a). Thus, A consists precisely of those points 
of S which are closer to the center than to its circumference. Therefore 


areaof A a(r/2) 1 
area of S ar 4 


p=p(A)= 


(m,n + 1) (m+1,n+1) 


(m, n) (m+ 1, n) 


(a) A is shaded. (b) A is shaded. 


Fig. 3-9 
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3.27. Consider the plane R’, and let X denote the subset of points with integer coordinates. A coin 
of radius 1/4 is tossed randomly on the plane. Find the probability that the coin covers a point 
of X. 


Let S denote the set of points inside a square with corners 
(m,n), (m,n + 1), (m + 1,n), (m+1,n+1) 
where m and n are integers. Let A denote the set of points in S with distance less than 1/4 from any 
corner point, as pictured in Fig. 3-9(b). Note that the area of A is equal to the area inside a circle of radius 


1/4. Suppose the center of the coin falls in $. Then the coin will cover a point in X if and only if its 
center falls in A. Accordingly, 


_areaofA  x(1/4) oa 
area of $ 1 16 


= 0.2 


(Note: We cannot take S to be all of R? since the area of R?’ is infinite.) 


3.28. On the real line R, points a and b are selected at random such that0 <a <3 and -2=b <0, 
as shown in Fig. 3-10(a). Find the probability p that the distance between a and b is greater 
than 3. 


The sample space S consists of the ordered pairs (a, b) and so forms a rectangular region shown in 
Fig. 3-10(b). On the other hand, the set A of points (a, b) for which d = a — b > 3 consists of those points 
which lie below the line x — y = 3, and hence form the shaded region in Fig. 3-10(b). Thus 


areaofA 2 1 


e (A) areaofS 6 3 


(a) (b) 


Fig. 3-10 


3.29. Three points are selected at random from the circumference of a circle. Find the probability 
p that the three points lie on a semicircle. 
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Suppose the length of the circumference is 2s. Let x denote the clockwise arc length from a to b, and 
let y denote the clockwise arc length from a to c, as pictured in Fig. 3-11(a). Thus 


O0<x<2s and O0<y<2s (*) 


Let S denote the set of points in R? for which the condition (*) holds. Let A denote the set of points of 
S for which any of the following conditions hold: 


(i) x,y<s, (iii) x<s and y-x>s, 
(ii) x,y>s, (iv) y<s and x-y>s. 
Then A, shaded in Fig. 3-11(b), consists of those points for which a, b, c lie on a semicircle. Thus 


_areaofA 3s? 3 
areaofS 4s? 4 


)\ (i 


Ss 


c “A, 
, (i) 


(b) A is shaded. 


Fig. 3-11 


3.30. A coin of diameter 1/2 is tossed randomly onto the plane R’. Find the probability p that the 
coin does not intersect any line of the form x = k where k is an integer. 


The lines are all vertical, and the distance between adjacent lines is one. Let S denote a horizontal 
line segment between adjacent lines, say, x = k and x = k + 1; and let A denote the points of S which are 
at least 1/4 from either end, as pictured in Fig. 3-12. Note that the length of S is 1 and the length of A 
is 1/2. Suppose the center of the coin falls in $. Then the coin will not intersect the lines if and only if 
its center falls in A. Accordingly 


_lengthofA 1/2 1 


length of S 1 2 


Pp 


x=k x=k+1 


Fig. 3-12 


www.ebook3000.com 


CHAP. 3] INTRODUCTION TO PROBABILITY 79 


MISCELLANEOUS PROBLEMS 
3.31. Show that axiom [P3] follows from axiom [P3]. 


First we show that P(@) = 0 using [P3] instead of [P3]. We have @ =@+@+@+--- where the 
empty sets are disjoint. Say P(@) =a. Then, by [P3], 


P(@) = P(/OGW+G+G+---) = PB) + PO) + P/O) +--: 


However, zero is the only real number a satisfying a= a+a+ta+t---. Therefore, P(O) = 0. 
Suppose A and B are disjoint. Then A, B, @, W,... are disjoint, and 


AUB=AUBU®M®USOU... 


Hence, by [P3], 
P(AUB) = PAUBUSUQBU:--:) = P(A) + P(B) + PQ) + P(D) +++ 
P(A) + P(B) +0 +0+-:- = P(A) + P(B) 


which is [Ps]. 


3.32. Prove Theorem 3.9. Suppose S = {a;, a, ..., a,} and each a; is assigned the probability p; 
where: (i) p; = 0, and (ii) }p;=1. For any event A, let 


P(A) = X(p;: 4; € A) 
Then P satisfies: (a) [P,], (6) [P2], (c) [Ps]. 
(a) Each p;= 0; hence P(A) = =p; = 0. 
(b) Every a; © S; hence P(S) = pi + po +++: +p, =1. 
(c) Suppose A and B are disjoint, and 
P(A) = X(p;:4; © A), and P(B) = X(p,: a, © B) 


Then the a,’s and a,’s are distinct. Therefore 


P(A U B) = X(p,:p,€ A U B) = S(p,:a, € A) + S(p;: 4, € B) = P(A) + P(B) 


3.33. Let S = {a,, a, ..., a,} and T = {b,, bo, ..., b,| be finite probability spaces. Let the number 
pi; = P(a;)P(b,) be assigned to the ordered pair (q;, b;) in the product set 
SX T= {(s,):s€S,t€ T} 
Show that the p;; define a probability space on S x T, that is, show that: 
(i) The p; are nonnegative. (ii) The sum of the p, equals one. 


(This is called the product probability space. We emphasize that this is not the only probability 
function that can be defined on the product set S X T.) 

Since P(a,), P(b;) = 0, for each i and each j, we have p,; = P(a;)P(b;) =0. Hence (i) is true. 

Also, we have 


Put Pris ters + Pur + Par + Pro F002 + Pa tr + Psi + Psa H+ + + Pst 
= P(a,)P(bi) + +++ + Play) P(B,) + +++ + Plas) P(bi) + +++ + Plas) P(bi) 
= P(a)[P(b1) + +++ + P(b,)] + +++ + P(as)[P(b1) + +++ + P(B,)] 
= P(a,)+1+---+ P(a,)+1 = P(a) +--+: + Plas) =1 


That is, 


> a= D, Pla)P(e) = >) Pla) D, Pb) =D) Pla)-1= D) Pla) =1 


i i i 


Thus, (ii) is true. 
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3.34. Prove Corollary 3.7: For any events A, B, C, we have: 
P(AUBUC)= P(A) + P(B) + P(C) -— PAN B)-— P(ANC)— PIBNC)+ P(ANBNC) 


Let D=BUC. Then AND=AN(BUC)=(ANB)U(ANC). Using the addition rule 
(Theorem 3.6), we get 
P(AN D) = PAN B)U(ANO)] = PAN B)+ P(ANC)-P(ANBNANC) 
= P(ANB)+ P(ANC)- P(ANBNC) 


Using the addition rule (Theorem 3.6) again, we get 


P(AU BUC) = P(A UD) = P(A) + P(D) — P(AND) = P(A) + (BUC) — P(AND) 
P(A) + [P(B) + P(C) — P(BN C)] —- [P(ANB) + P(ANC)- P(ANBNO)| 
P(A) + P(B) + P(C) — (ANB) — PAN C)— P(BNC) + PPANBNC) 


3.35. A die is tossed 100 times. The following table lists the six numbers and the frequency with 
which each number appeared: 


Number | 1 2 3 4 5 6 


Frequency | 14 17 20 18 15 16 


(a) Find the relative frequency f of each of the following events: 
A = {3 appears}, B = {5 appears}, C = {even number appears} 
(b) Find a probability model of the data. 


number of successes 


The relative fi = . Th 
(a) erelalive Megueney £ total number of trials as 
20 15 17+ 18+ 16 
= — = 0.20 = — = 0.15 = ——_—— = 0.52 
ta = 709 ‘ ta = T09 : Ie 100 


(b) The geometric symmetry of the die indicates that we first assume an equal probability space. Statis- 
tics is then used to decide whether or not the given data supports the assumption of a fair die. 


Supplementary Problems 


SAMPLE SPACES AND EVENTS 
3.36. Let A and B be events. Find an expression and exhibit the Venn diagram for the event that: 


(a) A or not B occurs, (b) only A occurs. 


3.37. Let A, B, and C be events. Find an expression and exhibit the Venn diagram for the event that: 


(a) A or C, but not B occurs, (c) none of the events occurs, 


(b) exactly one of the three events occurs, (d) at least two of the events occur. 
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3.38. 


3.39. 


3.40. 


A penny, a dime, and a die are tossed. Describe a suitable sample space S, and find n(S). 


For the space S in Problem 3.38, express explicitly the following events: 


A = {two heads and an even number}, B = {2 appears} 


C = {exactly one head and an odd number} 


For the events A, B, C in Problem 3.39, express explicitly the event: 
(a) Aand B, (b) only B, (c) Band C, (d) A but not B. 


FINITE EQUIPROBABLE SPACES 


3.41. 


3.42. 


3.43. 


3.44. 


3.45. 


3.46. 


3.47. 


3.48. 


Determine the probability of each event: 


(a) An odd number appears in the toss of a fair die. 
(6) 1 or more heads appear in the toss of 4 fair coins. 
(c) Both numbers exceed 4 in the toss of 2 fair dice. 
(d) Exactly one 6 appears in the toss of 2 fair dice. 


(e) A red or a face card appears when a card is randomly selected from a 52-card deck. 


A student is chosen at random to represent a class with 5 freshmen, 4 sophomores, 8 juniors, 
and 3 seniors. Find the probability that the student is 


(a) a sophomore (b) a senior (c) a junior or a senior 


One card is selected at random from 25 cards numbered 1 to 25. Find the probability that the number 
on the card is: (a) even, (b) divisible by 3, (c) even and divisible by 3, (d) even or divisible by 3, 
(e) ends in the digit 2. 


Three bolts and three nuts are ina box. Two parts are chosen at random. Find the probability that one 
is a bolt and one is a nut. 


A box contains 2 white sox, 2 blue sox, and 2 red sox. Two sox are drawn at random. Find the 
probability they are a match (same color). 


Of 120 students, 60 are studying French, 50 are studying Spanish, and 20 are studying both French and 
Spanish. A student is chosen at random. Find the probability that the student is studying: 

(a) French and Spanish (d) only French 

(b) French or Spanish (e) exactly one of the two languages. 


(c) neither French nor Spanish 


Of 10 girls in a class, 3 have blue eyes. Two of the girls are chosen at random. Find the 
probability that: 


(a) both have blue eyes (c) at least one has blue eyes 
(b) neither has blue eyes (d) exactly one has blue eyes. 
Ten students A, B, ... are in a class) A committee of 3 is chosen from the class. Find the 


probability that 


(a) A belongs to the committee. (c) A and B belong to the committee. 
(b) B belongs to the committee. (d) A or B belongs to the committee. 
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FINITE PROBABILITY SPACES 


3.49, 


3.50. 


3.51. 


3.52. 


3.53. 


3.54, 


3.55. 


ODDS 


3.56. 


3.57. 


3.58. 


Under which of the following functions does S = {a,, a;, a3} become a probability space? 


(a) P(a,) = 0.3, P(az) = 0.4, P(a3) = 0.5 (c) P(a,) = 0.3, P(az) = 0.2, P(a3) = 0.5 
(b) P(a) = 0.7, P(a) = —0.2, P(as) = 0.5 (d) P(a) = 0.3, P(a) = 0, P(as) = 0.7 


A coin is weighted so that heads is three times as likely to appear as tails. Find P(H) and P(T). 


Suppose A and B are events with P(A) = 0.7, P(B) = 0.5, and P(A NM B) = 0.4. Find the probability 
that 


(a) A does not occur. (c) A but not B occurs. 
(b) A or B occurs. (d) Neither A nor B occurs. 


Consider the following probability distribution: 


Outcome | 1 2 3 4 5 6 


Probability | 0.1 0.3 0.1 0.2 0.2 0.1 


Consider the following events: 
A = {even number}, B = {2,3,4, 5}, C = {1,2} 
Find: (a) P(A), (b) P(B), (c) P(C), (d) P(@), (e) P(S). 


For the events A, B, C in Problem 3.52, find: 
(a) PANB), (b) PAUC), (c) (BNC), (d) P(A, (e) P(PBAC*. 


Three students A, B, and C are in a swimming race. A and B have the same probability of winning and 
each is twice as likely to win as C. Find the probability that 


(a) B wins (b) C wins (c) Bor C wins 


Let P be a probability function on S = {a;, a2, a3}. Find P(a,) if 
(a) P(a,) = 0.3, P(a3) = 0.5; (c) P({a2, a3}) = 2P(a,); 
(b) P(a,) = 2P(az) and P(a3) = 0.7; (d) P(a3) = 2P(az) and P(az) = 3P(a;). 


Find the probability of an event E if the odds that it will occur are: (a) 2 to 1, (b) 5 to 11. 
Find the odds that an event E occurs if: (a) P(E) = 2/7, (b) P(E) = 0.4. 


In a swimming race, the odds that A will win are 2 to 3 and the odds that B will win are 1 to 4. Find the 
probability p and the odds that: (a) A will lose, (b) A or B will win, (c) neither A nor B will win. 


NONCOUNTABLE UNIFORM SPACES 


3.59. 


3.60. 


A point is chosen at random inside a circle with radius yr. Find the probability p that the point is at most 
;r from the center. 


A point A is selected at random inside an equilateral triangle whose side length is 3. Find the probability 
p that the distance of A from any corner is greater than 1. 
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3.61. 


3.62. 


A coin of diameter 1/2 is tossed randomly onto the plane R*. Find the probability p that the coin does 
not intersect any line of the form: (a) x = k or y = k where & is an integer, (b) x + y = k where k is an 
integer. 


A point X is selected at random from a line segment AB with midpoint O. Find the probability p that 
the line segments AX, XB, and AO can form a triangle. 


MISCELLANEOUS PROBLEMS 


3.63. 


3.64. 


3.65. 


3.36. 


3.37. 


3.38. 


3.39. 


3.40. 


3.41. 


3.42. 


3.43. 


3.44, 


3.45. 


A die is tossed 50 times. The following table gives the 6 numbers and their frequency of occurrence: 


Number | 1 2 3 4 5 6 


Frequency | 7 9 8 7 9 10 


Find the relative frequency of each event: (a) 4 appears, (b) an odd number appears, (c) a number greater 
than 4 appears. 


Use mathematical induction to prove: For any events Aj, Ao, ..., An, 
P(A, U-+-UA,) = >) P(A) — &) P(A; A) +S P(A; A) 1A) = = P(A A.) 
i i<j i<j<k 


Remark: This result generalizes Theorem 3.6 (addition rule) for two sets and Corollary 3.7 for 
three sets. 


Consider the countably infinite sample space S = {a,, a2, a3, ...}. Suppose P(a,) = 1/4 and suppose 
P(ay+1) = rP(a,) fork =1,2,.... Find r and P(as). 


Answers to Supplementary Problems 
(a) AU BY; (b) ANB’. 


(a) (AU C)NB; (b) (AN BNC) U(ASN BNC) U(ASN BNC); (c) (AUBU BY = AN BNC 
(d) (ANB) U(ANCU(BNO. 


n(S) = 24; S = {H, T} X {H, T) x {1,2,, ..., 6} = {HH1, ..., HH6, HT1, ..., TT6}. 

A = {HH2, HH4, HH6}; B = {HH2, HT2, TH2, TT2}; C = {HT1, HT3, HT5, TH1, TH3, TH5}. 
(a) {HH2}; (b) {HT2, TH2, TT2}; (c) @; (d) {HH4, HH6}. 

(a) 3/6; (b) 15/16; (c) 4/36; (d) 10/36; (e) 32/52. 

(a) 4/20; (b) 3/20; (c) 11/20. 

(a) 12/25; (b) 8/25; (c) 4/25; (d) 16/25; (e) 3/25. 

9/15 = 3/5. 


3/15 = 1/5. 
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3.46. 


3.47. 


3.48. 


3.49, 


3.50. 


3.51. 


3.52. 


3.53. 


3.54, 


3.55. 


3.56. 


3.57. 


3.58. 


3.59. 


3.60. 


3.61. 


3.62. 


3.63. 


3.65. 


INTRODUCTION TO PROBABILITY 


(a) 1/6; (b) 3/4; (c) 1/4; (d) 1/3; (e) 7/12. 
(a) 1/15; (b) 7/15; (c) 8/15; (d) 7/15. 
(a) 3/10; (b) 3/10; (c) 1/15; (a) 8/15. 
(c) and (d). 

P(H) = 3/4; P(T) = 1/4. 

(a) 0.3; (b) 0.8; (c) 0.2; (d) 0.2. 

(a) 0.6; (b) 0.8; (c) 0.4; (d) 0; (e) 1. 
(a) 0.5; (b) 0.7; (c) 0.3; (d) 0.4; (e) 0.5. 
(a) 2/5; (b) 1/5; (c) 3/5. 

(a) 0.2; (b) 0.2; (c) 1/3; (d) 0.1. 

(a) 2/3; (b) 5/16. 


(a) 2 to 5; (b) 2 to 3. 


(a) p = 3/5, odds 3 to 2; (b) p = 3/5, odds 3 to 2; (c) p = 2/5, odds 2 to 3. 


1/9. 
1 — 2a/(9V3) = 1 -— 2V370/27. 
(a) 1/4; (b) 1 -— V2/2. 

i. 

(a) 7/50; (b) 24/50; (c) 19/50. 


r = 3/4; P(a;) = 9/64. 
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4.1 


Conditional 
Probability and 
Independence 


INTRODUCTION 


The notions of conditional probability and independence will be motivated by two well-known 


examples. 


(a) 


(b) 


Gender Gap: Suppose candidate A receives 54 percent of the entire vote, but only 48 percent 
of the female vote. Let P(A) denote the probability that a random person voted for A, but let 
P(A|W) denote the probability that a random woman voted for A. Then 


P(A) =0.54 but  P(A|W) = 0.48 


P(A|W) is called the condition probability of A given W. Note that P(A|W) only looks at the 
reduced sample space consisting of women. The fact that P(A) # P(A|W) is called the gender 
gap in politics. On the other hand, suppose P(A) = P(A|W). We then say there is no gender 
gap, that is, the probability that a person voted for A is “independent” of the gender of the 
voter. 


Insurance Rates: Auto insurance rates usually depend on the probability that a random person 
will be involved in an accident. It is well known that male drivers under 25 years old get into 
more accidents than the general public. That is, letting P(A) denote the probability of an 
accident and letting E denote male drivers under 25 years old, the data tell us that 


P(A) < P(A|E) 


Again we use the notation P(A|£) to denote the probability of an accident given that the driver 
is male and under 25 years old. 


This chapter formally defines conditional probability and independence. We also cover finite 


stochastic processes, Bayes’ theorem, and independent repeated trials. 
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4.22 CONDITIONAL PROBABILITY 


Suppose F is an event in a sample space S with P(E) > 0. The probability that an event A occurs 
once E has occurred or, specifically, the conditional probability of A given E, written P(A|E), is defined 
as follows: 


P(ANE) 


P(A|E) = PE) 


As pictured in the Venn diagram in Fig. 4-1, P(A|E) measures, in a certain sense, the relative 
probability of A with respect to the reduced space E. 


Fig. 4-1 


Now suppose S is an equiprobable space, and we let n(A) denote the number of elements in the 
event A. Then 


n(A NE) 
n(S)” 


We state this result formally. 


n(E) 
n(S) ° 


P(ANE)_n(ANE) 
P(E) n(E) 


P(AN E) = P(E) = and so P(A|E) = 


Theorem 4.1; Suppose S is an equiprobable space and A and B are events. Then 


number of elementsin ANE  n(AN E) 
number of elements in E n(E) 


P(A|E) = 


EXAMPLE 4.1 A pair of fair dice is tossed. The sample space S consists of the 36 ordered pairs (a, b) where 
a and b can be any of the integers from 1 to 6. (See Fig. 3-3.) Thus, the probability of any point is 1/36. Find 
the probability that one of the dice is 2 if the sum is 6. That is, find P(A|E) where 


E = {sum is 6} and A = {2 appears on at least one die} 
Also find P(A). 


Now E consists of five elements, specifically 
E = {(1,5), (2,4), G,3), (4,2), 6, D} 
Two of them, (2,4) and (4, 2), belong to A, that is, AM E = {(2,4), (4,2)}.. By Theorem 4.1, P(A|E) = 2/5. 
On the other hand, A consists of 11 elements, specifically: 


A = {(2, 1), (2,2), (2,3), (2,4), (2,5), (2, 6), (1,2), 3,2), (4,2), 5,2), (6, 2)} 
and S consists of 36 elements; hence P(A) = 11/36. 


EXAMPLE 4.2 Suppose a couple has two children. The sample space is S = {dd, bg, gb, gg} where we assume 
an equiprobable space, that is, we assume probability 1/4 for each point. Find the probability p that both children 
are boys if it is known that: (a) At least one of the children is a boy. (b) The older child is a boy. 


(a) Here the reduced sample space consists of three elements {bb, bg, gb}; hence p = 1/3. 


(b) Here the reduced sample space consists of two elements {bb, bg}; hence p = 1/2. 
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Multiplication Theorem for Conditional Probability 


Suppose A and B are events in a sample space S with P(A) >0. By definition of conditional 
probability and using AN B = BNA, we obtain: 


Pelayo) oa 


Multiplying both sides by P(A) gives us the following useful formula: 
Theorem 4.2 (Multiplication Theorem for Conditional Probability): 
P(A 1B) = P(A)P(B|A) 


The multiplication theorem gives us a formula for the probability that events A and B both 
occur. It can be extended to three or more events. For three events, we get: 


Corollary 4.3: P(A BMC) = P(A)P(B|A)P(C|AN B) 
That is, the probability that A, B, and C occur is equal to the product of the following: 


(i) The probability that A occurs. 
(ii) The probability that B occurs, assuming that A occurred. 
(iii) The probability that C occurs, assuming that A and B have occurred. 


We apply this result in the following example. 


EXAMPLE 4.3 A lot contains 12 items of which 4 are defective. Three items are drawn at random from the 
lot one after the other. Find the probability p that all 3 are nondefective. 


We compute the following 3 probabilities: 


(i) The probability that the first item is nondefective is 4 since 8 of 12 items are nondefective. 


(ii) Assuming that the first item is nondefective, the probability that the second item is nondefective is 74 since 
only 7 of the remaining 11 items are nondefective. 


(iii) Assuming that the first and second items are nondefective, the probability that the third item is nondefective 
is ¢, since only 6 of the remaining 10 items are now nondefective. 


Accordingly, by the multiplication theorem, 


8 V6 a4 


Poa: Th 40° 55 


43 FINITE STOCHASTIC PROCESSES AND TREE DIAGRAMS 


A (finite) stochastic process is a finite sequence of experiments where each experiment has a finite 
number of outcomes with given probabilities. A convenient way of describing such a process is by 
means of a labeled tree diagram, as illustrated below. The multiplication theorem (Theorem 4.2) 
can then be used to compute the probability of an event which is represented by a given path of 
the tree. 


EXAMPLE 4.4 Suppose the following three boxes are given: 


Box X has 10 lightbulbs of which 4 are defective. 
Box Y has 6 lightbulbs of which 1 is defective. 
Box Z has 8 lightbulbs of which 3 are defective. 


A box is chosen at random, and then a bulb is randomly selected from the chosen box. 
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(a) Find the probability p that the bulb is nondefective. 
(b) If the bulb is nondefective, find the probability that it came from box Z. 


Here we perform a sequence of two experiments: 
(i) Select one of the three boxes. 
(ii) Select a bulb which is either defective (D) or nondefective (N). 


The tree diagram in Fig. 4-2 describes this process and gives the probability of each edge of the tree. The 
multiplication theorem tells us that the probability of a given path of the tree is the product of the probabilities 
of each edge of the path. For example, the probability of selecting box X and then a nondefective bulb N from 
box X is as follows: 


13 1 
3 5 5 
2 
> D 
x 
3 N 
x 
] 
6 D 
Ya 
5 N 
6 
5 
8 D 
Zz < 
3 N 
Fig. 4-2 


(a) Since there are three disjoint paths which lead to a nondefective bulb N, the sum of the probabilities of these 
paths gives us the required probability. Namely 


3 15.05 7 


PINS a 6a a eG 


(b) Here we want to compute P(Z|N), the conditional probability of box Z given a nondefective bulb N. 
Now box Z and a nondefective bulb N, that is, the event ZM WN, can only occur on the bottom 
path. Therefore 
5 


i 5 
PZON)=3°3=54 


By part (a), we have P(N) = 247/360. Accordingly, by the definition of conditional probability, 


P(ZAN)_ 5/24 _ 75 


= 0,304 
P(N) _247/360—-.247 


P(Z|N) = 


In other words, we divide the probability of the successful path by the probability of the reduced sample 
space consisting of all the paths leading to N. 


EXAMPLE 4.5 Suppose a coin, weighted so that P(H) = 2/3 and P(T) = 1/3, is tossed. If heads appears, then 
a number is selected at random from the numbers 1 through 9; if tails appears, then a number is selected at random 
from the numbers 1 through 5. Find the probability that an even number appears. 

Note that the probability of selecting an even number from the numbers 1 through 9 is § since there are 4 even 
numbers out of the 9 numbers, whereas the probability of selecting an even number from the numbers 1 through 
5 is 2 since there are 2 even numbers out of the 5 numbers. Thus, Fig. 4-3 is the tree diagram with the respective 
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probabilities which represents the above stochastic process. (Here E means an even number is selected and 0 
means an odd number is selected.) There are two paths in the tree which lead to an even number, HE and 
TE. Thus 


inher! Mel Bee re, 
es G00 3 (a5. 


4.4 PARTITIONS, TOTAL PROBABILITY, AND BAYES’ FORMULA 


Suppose a set S is the union of mutually disjoint subsets A,, Az, ..., A,, that is, suppose the sets 
A,, Az, ..., A, form a partition of the set S. Furthermore, suppose EF is any subset of S. Then, as 
illustrated in Fig. 4-4 for the case n = 3, 


E=ENS=EN(A,;,UA,U...UA,) = (EN Aj) U(EN A) U...U(ENA,) 


Moreover, the 7 subsets on the right in the above equation, are also mutually disjoint, that is, form a 
partition of E. 


S 
Fig. 4-4 
Law of Total Probability 
Now suppose S is a sample space and the above subsets A;, A2,..., A,, E are events. Since the 


E( A, are disjoint, we obtain 
P(E) = P(EN A,) + P(EN A2) + +++ + P(EN A,) 
Using the multiplication theorem for conditional probability, we also obtain 
P(E Ax) = P(A, N E) = P(A,)P(E|Ax) 
Thus, we arrive at the following theorem: 


Theorem 4.4 (Total Probability): Let E be an event in a sample space S, and let A;, Az, ..., A,, be 
mutually disjoint events whose union is S. Then 


P(E) = P(A;)P(E|A1) + P(A2)P(E|A2) + +++ + P(An)P(E|An) 
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The equation in Theorem 4.4 is called the law of total probability. We emphasize that the sets A,, 
Ao, ..., An are pairwise disjoint and their union is all of S, that is, that the A’s form a partition of S. 


EXAMPLE 4.6 A factory uses three machines X, Y, Z to produce certain items. Suppose: 
(1) Machine X produces 50 percent of the items of which 3 percent are defective. 
(2) Machine Y produces 30 percent of the items of which 4 percent are defective. 
(3) Machine Z produces 20 percent of the items of which 5 percent are defective. 
Find the probability p that a randomly selected item is defective. 
Let D denote the event that an item is defective. Then, by the law of total probability, 
P(D) = P(X)P(D|X) + P(Y)P(D|¥) + P(Z)P(D|Z) 
= (0.50)(0.03) + (0.30)(0.04) + (0.20)(0.05) = 0.037 = 3.7% 


Bayes’ Theorem 


Suppose the events A;, A>, ..., A, do form a partition of the sample space S, and F is any 
event. Then, for k = 1, 2,...,, the multiplication theorem for conditional probability tells us that 
P(A, 0 E) = P(A,)P(E|A;). Therefore 


P(A, ME) _ P(Ay)P(E|An) 
P(E) P(E) 


Using the law of total probability (Theorem 4.4) for the denominator P(E), we arrive at the next 
theorem. 


P(A;|E) = 


Theorem 4.5 (Bayes’ Formula): Let £ be an event in a sample space S, and let A,, Az, ..., A, be 
disjoint events whose union is S$. Then, for k = 1, 2,..., ”, 
P(A, )P(E| Ax) 


PE) P(A,)P(E|A,) + P(Az)P(E|A2) +--+: + P(A,)P(E|A,) 


The above equation is called Bayes’ rule or Bayes’ formula, after the English mathematician 
Thomas Bayes (1702-1761). If we think of the events Aj, Ao, ..., A, as possible causes of the event 
E, then Bayes’ formula enables us to determine the probability that a particular one of the A’s 
occurred, given that F occurred. 


EXAMPLE 4.7 Consider the factory in Example 4.6. Suppose a defective item is found among the out- 

put. Find the probability that it came from each of the machines, that is, find P(X|D), P(Y|D), and P(Z|D). 
Recall that P(D) = P(X) P(D|X) + P(Y)P(D|Y) + P(Z)P(D|Z) = 0.037. Therefore, by Bayes’ formula, 

P(X)P(D|X) _ (0.50)(0.03) _ 15 


AOR) = P(D) 0.037 7 id 
P(Y)P(D 0.30)(0.04) 12 

EME) Thy ms a - a 
P(Z)P(D|Z) — (0.20)(0.05) 10 

PAP) ve =| a  - a 


Stochastic Interpretation of Total Probability and Bayes’ Formula 


Frequently, problems involving the total probability law and Bayes’ formula can be interpreted as 
a two-step stochastic process. Figure 4-5 gives the stochastic tree corresponding to Fig. 4-4 where the 
first step in the tree involves the events A;, Az, A; which partition S and the second step involves the 
arbitrary event E. 
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Suppose we want P(E). Using the tree diagram, we obtain 
P(E) = P(A) P(E|A1) + P(A2)P(E|A2) + P(A) P(E|As) 
Furthermore, for k = 1, 2, 3, 
P(A, NE) _ P(Ax)P(E|Ax) 
P(E) P(E) 
_ P(A;) P(E| Ax) 
P(A, )P(E|A)) + P(Az)P(E|A2) + P(As)P(E|As) 


P(A,|E) = 


Observe that the above two formulas are simply the total probability law and Bayes’ formula, for the 
case n = 3. The stochastic approach also applies to any positive integer n. 


EXAMPLE 4.8 Suppose a student dormitory in a college consists of: 


(1) 30 percent freshmen of whom 10 percent own a car 
(2) 40 percent sophomores of whom 20 percent own a car 
(3) 20 percent juniors of whom 40 percent own a car 


(4) 10 percent seniors of whom 60 percent own a car 


(a) Find the probability that a student in the dormitory owns a car. 


(b) If astudent does own a car, find the probability that the student is a junior. 


Let A, B, C, D denote, respectively, the set of freshmen, sophomores, juniors, and seniors, and let EF denote 
the set of students owning a car. Figure 4-6 is a stochastic tree describing the given data. 


10% 


20% 


40% 


60% 
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(a) We seek P(E). By the law of total probability or by using Fig. 4-6, we obtain 
P(E) = (0.30)(0.10) + (0.40)(0.20) + (0.20)(0.40) + (0.10)(0.60) 
0.03 + 0.08 + 0.08 + 0.06 = 0.25 = 25% 
(b) We seek P(C|E). By Bayes’ formula, 


P(C)P(E|C) _ (0.20)(0.40) _ 8 


EE) P(E) 0.25 25 


= 32% 


45 INDEPENDENT EVENTS 


Events A and B in a probability space S are said to be independent if the occurrence of one of them 
does not influence the occurrence of the other. More specifically, B is independent of A if P(B) is the 
same as P(B|A). Now suppose we substitute P(B) for P(B|A) in the multiplication theorem that 
P(A 1B) = P(A)P(B|A). This yields 


P(A B) = P(A)P(B) 


We formally use the above equation as our definition of independence. 


Definition: Events A and B are independent if P(AMB)= P(A)P(B); otherwise they are 
dependent. 


We emphasize that independence is a symmetric relationship. In particular 


P(A 2 B) = P(A)P(B) implies both P(B|.A) = P(B) and P(A|B) = P(A) 


Note also that disjoint (mutually exclusive) events are not independent unless one of them has 
zero probability. That is, suppose AM B = © and A and B are independent. Then 


P(A)P(B) = P(AN B)=0 and so P(A) =0 or P(B) =0 


EXAMPLE 4.9 A fair coin is tossed three times yielding the equiprobable space 
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 
Consider the events: 
A = {first toss is heads} = {HHH, HHT, HTH, HTT} 
B = {second toss is heads} = {HHH, HHT, THH, THT} 
C = {exactly two heads in a row} = {HHT, THH} 


Clearly A and B are independent events; this fact is verified below. On the other hand, the relationship between 
A and C and between B and C is not obvious. We claim that A and C are independent, but that B and C are 
dependent. Note that 


4 1 4 1 2 
P(A) =—-=-— P(B) =—-=-— P Sa 
@=3=5 PB=5=5 MO=5=;5 
Also 


P(A B) = P({HHH, HHT}) = : P(AN C) = P({HHT}) = 7 P(B NC) = P(\HHT, THH}) = : 
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Accordingly 
fd : 
P(A)P(B) = ee P(A B); hence A and B are independent. 
ses Cie : 
P(A)P(C) = a SO P(A QC); hence A and C are independent. 
fot. : 
P(B)P(C) = a # P(B MC); hence B and C are not independent. 


Frequently, we will postulate that two events are independent, or the experiment itself will imply 
that two events are independent. 


EXAMPLE 4.10 The probability that A hits a target is }, and the probability that B hits the target is 2. Both 
shoot at the target. Find the probability that at least one of them hits the target, that is, find the probability that 
A or B (or both) hits the target. 


Here P(A) =4 and P(B) = 3, and we seek P(A UB) 


Furthermore, we assume that A and B are independent events, that is, that the probability that A or B hits 
the target is not influenced by what the other does. Therefore 
2 

P(AN B) = P(A)P(B) =—: 3 io 


Accordingly, by the addition rule in Theorem 3.6, 


P(A U B) = P(A) + P(B) — P(ANB) a3 


Independence of Three or More Events 
Three events A, B, C are independent if the following two conditions hold: 
(1) They are pairwise independent, that is, 
P(AN B) = P(A)P(B), P(ANC)= P(A)P(C), P(BNC) = P(B)P(C) 
(2) PLAN BNC) = P(A)P(B)P(C) 


Example 4.11 below shows that condition (2) does not follow from condition (1), that is, three events 
may be pairwise independent but not independent themselves. [Problem 4.32 shows that condition 
(1) does not follow from condition (2).] 

Independence of more than three events is defined analogously. Namely, the events A;, Ao, ..., 
A,, are independent if any proper subset of them is independent and 


P(A,N.A.N...9A,) = P(Ay)P(A2)... P(An) 


Observe that induction is used in this definition. 


EXAMPLE 4.11 A pair of fair coins is tossed yielding the equiprobable space S = {HH, HT, TH, TT}. Con- 
sider the events: 


A = {head on first toss} = {HH, HT}, B = {head on second toss} = {HH, TH}, 
C = {head on exactly one coin} = {HT, TH} 


Then P(A) = P(B) = P(C) =4=3. Also 


P(AN B) = P(HHA}) = 7 P(A NC) = P({HT}) = 7 P(BN C) = P({TH}) = ; 
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Thus, condition (1) is satisfied, that is, the events are pairwise independent. On the other hand, 
ANBNC=@6 and so P(AN BNC) = P(@) = 0# P(A)P(B)P(C) 


Thus, condition (2) is not satisfied, and so the three events are not independent. 


4.6 INDEPENDENT REPEATED TRIALS 


Previously, we discussed probability spaces which were associated with an experiment repeated a 
finite number of times, such as the tossing of a coin three times. This concept of repetition is 
formalized as follows: 


Definition: Let S be a finite probability space. The probability space of n independent or repeated 
trials, denoted by S,, consists of ordered n-tuples of elements of S with the probability of 
an n-tuple defined to be the product of the probability of its components, that is, 


P((51, 52, ++ +5 5n)) = P(s,)P(s2)--+ P(s,) 


EXAMPLE 4.12 Suppose that whenever three horses a, b, c race together, their respective probabilities of 
winning are 1/2, 1/3, and 1/6. In other words, 


1 1 1 
S = {a,b,c} with Pa) ==, P(b)= =, and P(c)=— 
2 3 6 
Suppose the horses race twice. Then the sample space S, of the two repeated trials follows: 


S, = {aa, ab, ac, ba, bb, bc, ca, cb, cc} 


For notational convenience, we have written ac for the ordered pair (a,c). The probability of each point of S, 
follows: 


P(aa) = P(a)P(a) = x : = > P(ba) = * P(ca) = = 
P(ab) = P(a)P(b) = x : = ~. P(bb) = 7 P(cb) = ~ 
P(ac) = P(a)P(c) = > ; = = P(bc) = = P(cc) = = 


Thus, the probability that c wins the first race and a wins the second race is P(ca) = 4. 
Repeated Trials as a Stochastic Process 


From another point of view, the probability space of a repeated-trials process may be viewed as 
a stochastic process whose tree diagram has the following properties: 


(i) Each branch point has the same outcomes. 
(ii) All branches leading to the same outcome have the same probability. 


For example, the tree diagram for the repeated-trials process in Example 4.12 appears in Fig. 4-7. 
Observe that 


(i) Each branch point has outcomes a, b, c. 


(ii) All branches leading to outcome a have probability 3, to outcome b have probability 3, and to 
outcome c have probability ¢. 


These two properties are expected as noted above. 
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Solved Problems 


CONDITIONAL PROBABILITY 


4.1. Three fair coins, a penny, a nickel, and a dime, are tossed. Find the probability p that they are 
all heads if: (a) the penny is heads, (b) at least one of the coins is heads, (c) the dime is tails. 


The sample space has eight elements: 
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


(a) If the penny is heads, the reduced sample space is A = {HHH, HHT, HTH, HTT}. All coins are 
heads in only 1 of the 4 cases; hence p = 1/4. 


(b) If one or more of the coins is heads, the reduced sample space is 
B = {HHH, HHT, HTH, HTT, THH, THT, TTH} 
All coins are heads in only 1 of the 7 cases; hence p = 1/7. 


(c) If the dime (third coin) is tails, the reduced sample space is C = {HHT, HTT, THT, TTT}. None 
contains all heads; hence p = 0. 


4.2. A billiard ball is drawn at random from a box containing 15 billiard balls numbered 1 to 15, and 
the number n is recorded. 


(a) Find the probability p that n exceeds 10. 
(b) Ifn is even, find the probability p that n exceeds 10. 
(a) The n can be one of the 5 numbers, 11, 12, 13, 14,15. Hence p = 5/15 = 1/3. 


(b) The reduced sample space E consists of the 7 even numbers, that is, 
E = {2,4, 6,8, 10, 12,14}. Of these, only 2, 12, and 14, exceed 10. Hence p = 2/7. 


4.3. A pair of fair dice is thrown. Find the probability p that the sum is 10 or greater if: 
(a) 5 appears on the first die, (b) 5 appears on at least one die. 


Figure 3-3 shows the 36 ways the pair of dice can be thrown. 
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(a) If5 appears on the first die, the reduced sample space A has six elements: 
A = (6, 1), (5,2), G3), 5,4), (5,5), G, 6)} 
The sum is 10 or more on 2 of the 6 outcomes, (5,5) and (5,6). Thus, p = 2/6 = 1/3. 
(b) If5 appears on at least one die, the reduced sample space B has 11 elements: 
B= {6, 1), (5,2), (5,3), (5, 4), (5,5), (5, 6), (1,5), (2,5), (3, 5), (4,5), (6, 5) 
The sum is 10 or more on 3 of the 11 outcomes, (5,5), (5,6), and (6,5). Thus, p = 3/11. 


In a certain college, 25 percent of the students failed mathematics, 15 percent failed chemistry, 
and 10 percent failed both mathematics and chemistry. A student is selected at random. 
(a) If the student failed chemistry, what is the probability that he or she failed mathematics? 
(b) If the student failed mathematics, what is the probability that he or she failed chemistry? 
(c) What is the probability that the student failed mathematics or chemistry? 
(d) What is the probability that the student failed neither mathematics nor chemistry? 
(a) We seek P(M|C), the probability that the student failed mathematics, given that he or she failed 
chemistry. By definition, 

P(MNC) 010 10 2 

P(C) 015 15 3 


(b) We seek P(C|M), the probability that the student failed chemistry, given that he or she failed 
mathematics. By definition, 


P(M|C) = 


P(MNC)_ 0.10 10 2 
P(M) 0.25 25 ~=«5 
(c) By the addition rule (Theorem 3.6), 
P(M UC) = P(M) + P(C) — P(M N.C) = 0.25 + 0.15 — 0.10 = 0.30 


P(C|M) = 


(d) Students who failed neither mathematics nor chemistry form the complement of the set M U C, that 
is, form the set (MUC)*. Hence 


P((M U C)’) = 1— P(M UC) = 1—0.30 = 0.70 


A pair of fair dice is thrown. If the two numbers appearing are different, find the probability 
p that: (a) the sum is 6, (b) an ace appears, (c) the sum is 4 or less. 


There are 36 ways the pair of dice can be thrown (Fig. 3-3) and 6 of them, (1, 1), (2, 2), ..., (6, 6), have 
the same numbers. Thus, the reduced sample space EF will consist of 36 — 6 = 30 elements. 


(a) The sum 6 can appear in 4 ways: (1,5), (2,4), (4,2), (5,1). [We cannot include (3,3) since the 
numbers must be different.] Thus, p = 4/30 = 2/15. 


(b) An ace can appear in 10 ways: (1,2), (1,3), ..., (1,6) and (2,1), (3,1), ..., (6,1). [We cannot 
include (1,1) since the numbers must be different.] Thus, p = 10/30 = 1/3. 


(c) The sum is 4 or less in 4 ways: (3, 1), (1,3), (2,1), (1,2). [We cannot include (1,1) and (2, 2) since 
the numbers must be different.] Thus, p = 4/30 = 2/15. 


Let A and B be events with P(A) = 0.6, P(B) = 0.3, and P(A NM B) = 0.2. Find: 
(a) P(A|B) and P(B|A), (b) P(AUB), (c) P(AS and P(B‘). 
(a) By definition of conditional probability, 


P(ANB) 0.2 2 
P(B) 03 3 


_P(ANB) 021 
P(A) 06 3 


P(A|B) = , P(BIA) 
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(b) By the addition rule (Theorem 3.6), 
P(A U B) = P(A) + P(B) — P(AN B) = 0.6 + 0.3 - 0.2 =0.7 


(c) By the complement rule, 


P(A) =1—P(A)=1-06=04 and P(B)=1—P(B)=1-03=0.7 


4.7. Consider the data in Problem 4.6. Find: (a) P(A‘|B‘), (b) P(B‘|A‘). 


First compute P(A‘M B‘). By DeMorgan’s law, (A U B)‘ = A°N BY. Hence, by the complement 
rule, 
P(A‘ N B*) = P((A U BY) = 1—- P(AUB) =1-0.7=03 


P(ASOB) 03 3 


(a) P(A*|B‘) 


P(B’) 0.7 7 
oy rainy = BS 


4.8. Let A and B be events with P(A) =?, P(B) =3, and P(AUB) =}. Find P(A|B) and 


P(B|A). 
First find P(A M B) using the addition rule that P(A U B) = P(A) + P(B) — P(ANB). We have 
eee) 1 
-=-4=-P(ANB) or P(ANB)=-— 
4 8 8 4 


Now use the definition of conditional probability to get 


P(ANB) 1/4 2 e peje PAE ve 2 
P(B) 58 5 P(A) 3/8 3 


P(A|B) 


4.9. Find P(B|A) if: (a) A is a subset of B, (b) A and B are mutually exclusive (disjoint). 
[Assume P(A) > 0.] 


(a) If A is a subset of B [as pictured in Fig. 4-8(a)], then whenever A occurs, B must occur; hence 
P(B|A) =1. Alternately, if A is a subset of B, then A M B = A; hence 


P(ANB)_ P(A) _ 
P(A) P(A) 


P(B|A) = 
(b) If A and B are mutually exclusive, that is, disjoint [as pictured in Fig. 4-8(b)], then whenever A occurs, 
B cannot occur; hence P(B|A) = 0. Alternately, if A and B are disjoint, then A M B = @; hence 


P(ANB) P@) 0 | 
P(A) P(A) P(A) 


P(B|A) = 


Fig. 4-8 
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4.10. Let E be an event for which P(E) >0. Show that the conditional probability function P(*| £) 
satisfies the axioms of a probability space, that is 


[P,] For any event A, we have P(A|E) = 0. 
[P2] For any certain event S, we have P(S|E) = 1. 
[P3] For any two disjoint events A and B, we have 
P(A U B|E) = P(A|E) + P(B|E) 
[P3] For any infinite sequence of mutually disjoint events A, A>, ..., we have 
P(A, UA, U:::|E) = P(A,|E) + P(A,|E) +--: 
(a) We have P(A N E) =0 and P(E) > 0; hence 


PANE 
P(A|E) = Toe > () 
Thus, [P,] holds. 
(b) We have SM E = E; hence 
P(SNE) P(E) 
P(E) P(E) 


P(S|E) 1 


Thus, [P2] holds. 
(c) If A and B are disjoint events, then so are AM E and BM E. Furthermore, 


(AUB) N E= (AN E)U(BNE) 


Hence, 
P(A U B)N E] = P[(AN E)U (BN E)] = P(AN E)+ P(BNE) 
Therefore 
P(AU BIE) = PU(A > NE] _P(AN Sc N E) 
_ PANE) , P(BNE) | 
P(E) P(E) P(A|E) + P(B|E) 


Thus, [P3] holds. 


(d) [Similar to (c).] If A, Az,... are mutually disjoint events, then so are A, ND FE, AJN E,.... Also, 
by the generalized distributive law, 


(A, UA,U+::)NE=(A,N E)U(A,NE)U::: 
Thus 
P[(A,; UA2U++:)N E] = P[(A, 0 E) U (ALN E) U-:-] 
= P(A, N E)+ P(A,N E)+::- 
Therefore 


_ P[(A,U A, U+=) NE] 


P(A, U P, U-:-|E) 


P(E) 
_ P(A, NE) + P(A, EB) ++++_ P(ALME) , P(A2ME) | 
P(E) P(E) P(E) 


= P(A,|E) + P(A,|E)+--- 
Thus, [P3] holds. 
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MULTIPLICATION THEOREM 


4.11. 


4.12. 


4.13. 


4.14. 


A class has 12 men and 4 women. Suppose 3 students are selected at random from the 
class. Find the probability p that they are all men. 


The probability that the first student is a man is 12/16 since there are 12 men out of the 16 
students. If the first student is a man, then the probability that the second student is a man is 11/15 since 
there are 11 men left out of the 15 students left. Finally, if the first 2 students are men, then the 
probability that the third student is a man is 10/14 since there are now only 10 men out of the 14 students 
left. Accordingly, by the Multiplication Theorem 4.2, the probability that all 3 are men is 


12 11 10 ~=1 
ES 16: 1 14-28 
Another Method: There are C(16,3) = 560 ways to select 3 students out of 16 students, and 
C(12, 3) = 220 ways to select 3 men from the 12 men. Thus 


_20_ 1 
P™ 560 28 
A Third Method: Suppose the students are selected one after the other. Then there are 16-15-14 
ways to select 3 students, and there are 12-11-10 ways to select the 3 men. Thus 
_ 16-15-14 11 
P 72-11-10 28 


A person is dealt 5 cards from an ordinary 52-card deck (Fig. 3-4). Find the probability p that 
they are all spades. 


The probability that the first card is a spade is 13/52, that the second is a spade is 12/51, that the third 
is a spade is 11/50, and that the fourth is a spade is 10/48. (We assume in each case that the previous cards 
were spades.) Thus, by the Multiplication Theorem 4.2, 


_13 12 11 10 33 
52 51 50 48 66,640 


= 0.000 49 


Another Method: There are C(52, 5) ways to select 5 cards from the 52-card deck, and C(13, 5) ways 
to select 5 spades from the 13 spades. Thus 


_ C(13, 5) 


= ="! = 0,000.49 
P™ C(52,5) 


A box contains 7 red marbles and 3 white marbles. Three marbles are drawn from the box one 
after the other. Find the probability p that the first 2 are red and the third is white. 


The probability that the first marble is red is 7/10 since there are 7 red marbles out of the 10 
marbles. If the first marble is red, then the probability that the second marble is red is 6/9 since there are 
6 red marbles out of the remaining 9 marbles. Finally, if the first 2 marbles are red, then the probability 
that the third marble is white is 3/8 since there are 3 white marbles out of the remaining 8 marbles in the 
box. Accordingly, by the Multiplication Theorem 4.2, 


D653. 7 


=—'-: = 0.175 = 17.5% 
10 9 8 40 


P 


Students in a class are selected at random, one after the other, for an examination. Find the 
probability p that the men and women in the class alternate if: 


(a) the class consists of 4 men and 3 women, (b) the class consists of 3 men and 3 women. 
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(a) 


(b) 
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If the men and women are to alternate, then the first student must be aman. The probability that 
the first is a man is 4/7. If the first is a man, the probability that the second is a woman is 3/6 since 
there are 3 women out of the 6 students left. Continuing in this manner, we obtain that the 
probability that the third is a man is 3/5, the fourth is a woman is 2/4, that the fifth is a man is 2/3, 
that the sixth is a woman is 1/2, and that the last is a man is 1/1. Thus 


#3392341. 4 
7654321 35 


Pp 


There are two mutually exclusive cases: the first student is a man and the first is a woman. If the 
first student is a man, then, by the multiplication theorem, the probability p, that the students 
alternate is 
3322411. 1 

654321 = 20 


Pr 


If the first student is a woman, then, by the multiplication theorem, the probability p, that the 
students alternate is 


ies) 
WIN 
me 


FINITE STOCHASTIC PROCESSES 


4.15. Let X, Y, Z be three coins in a box. Suppose X is a fair coin, Y is two-headed, and Z is 
weighted so that the probability of heads is 1/3. A coin is selected at random and is 
tossed. (a) Find the probability that heads appears, that is, find P(H). (b) If heads appears, 
find the probability that it is the fair coin X, that is, find P(X|H). (c) If tails appears, find the 
probability it is the coin Z, that is, find P(Z|T). 


(4) 


(>) 


Construct the corresponding two-step stochastic tree diagram in Fig. 4-9(a). 
Heads appears along three of the paths; hence 
11 1 i 1 il 


P(H) ==-=+=-1+2: 
)=3°543 33 16 


Note X and heads H appear only along the top path in Fig. 4-9(a); hence 
PXNH) _ 1/6 3 


P(X MH) = (1/3) (1/2) = 1/6 andso P(X|H) = 


P(A) 11/18 «11 


Fig. 4-9 
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4.16. 


4.17. 


(c) P(T) =1—- P(A) =1-—11/18=7/18. Alternately, tails appears along two of the paths and so 


ie ree 
P rot ceSs 
(= 3°94 3°3 "18 


Note Z and tails T appear only along the bottom path in Fig. 4-9(a); hence 
PZT) 2/9 4 


P(Z T) = (1/3) (2/3) = 2/9 andso P(Z|T) = PT) Wi8 7 


Suppose the following three boxes are given: 


Box A contains 3 red and 5 white marbles. 
Box B contains 2 red and 1 white marbles. 
Box C contains 2 red and 3 white marbles. 


A box is selected at random, and a marble is randomly drawn from the box. If the marble is 
red, find the probability that it came from box A. 


Construct the corresponding stochastic tree diagram as in Fig. 4-9(b). We seek P(A|R), the 
probability that A was selected, given that the marble is red. Thus, it is necessary to find P(A MN R) and 
P(R). Note that A and R only occur on the top path; hence P(A NM R) = (1/3)(3/8) = 1/8. There are 
three paths leading to a red marble R; hence 


i Ses ee eee 
38°33 3 5 360 


P(R) ~ 0.48 


Thus 


P(ANR) 18 45 


P(A|R) = = 
alk) P(R) —-:173/360—:173 


= 0.26 


Box A contains 9 cards numbered 1 through 9, and box B contains 5 cards numbered 1 through 
5. A box is selected at random, and a card is randomly drawn from the box. If the number 
is even, find the probability that the card came from box A. 


Construct the corresponding stochastic tree diagram as in Fig. 4-10(a). We seek P(A|£), the 
probability that A was selected, given that the number is even. Thus, it is necessary to find P(A MN E) and 
P(E). Note that A and E only occur on the top path; hence P(A M E) = (1/2)(4/9) = 2/9. Two paths 
lead to an even number £; hence 


14,12 19 


P(ANE)_ 2/9 _ 10 
9.0. O05 — 5 


P(E) 19/4519 


P(E) andso P(A|E) = =~ 0.53 


Fig. 4-10 
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4.18. A box contains 3 red marbles and 7 white marbles. A marble is drawn from the box and the 
marble is replaced by a marble of the other color. A second marble is drawn from the box. 


4.19, 


(a) Find the probability p that the second marble is red. 


(b) If both marbles were of the same color, find the probability p that they both were white. 


(a) 


(b) 


Construct the corresponding stochastic tree diagram as in Fig. 4-10(b). 


Two paths lead to a red marble R; hence 


3.2, 7 47 
10 10 10 10 50 


Pp 0.34 

Note that W appears twice only on the bottom path; hence P(WW) = (7/10)(6/10) = 21/50 is the 
probability that both were white. There are two paths, the top path and the bottom path, where the 
marbles are the same color. Thus 


3 2 7 6 = 12 


PRR oe WW) = 
USE cE 1010 10 10 25 


is the probability of the same color, the reduced sample space. Therefore 


21/50 7 
=— = 0.875 


P1225 8 


A box contains a fair coin A and a two-headed coin B. A coin is selected at random and tossed 
twice. 


(a) If heads appears both times, find the probability p that the coin is two-headed. 


(b) If tails appears both times, find the probability p that the coin is two-headed. 


(a) 


(b) 


Construct the corresponding stochastic tree diagram as in Fig. 4-11. 


TT 
i 
1 
63 HT 
A i 
1 
2 TH 
l 
4 
TT 
1 
2 
1 
BHI 
Fig. 4-11 


We seek P(B|HH). Heads appears twice only in the top path and in the bottom path. Hence 
11 1 5 


P(HH) ==-—+=-1=~> 
24°23 8 


On the other hand, P(B N HH) = P(B) =3. Thus 


P(BOHH) 12 4 
P(B) 5/8 5 


p = P(B|HH) = 


If tails appears then it could not be the two-headed coin B. Hence p = 0. 
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4.20. Suppose the following two boxes are given: 


Box A contains 3 red and 2 white marbles. 
Box B contains 2 red and 5 white marbles. 


A box is selected at random; a marble is drawn and put into the other box; then a marble is 


drawn from the second box. Find the probability p that both marbles drawn are of the same 
color. 


Construct the corresponding stochastic tree diagram as in Fig. 4-12. Note that this is a three-step 
stochastic process: (1) choosing a box, (2) choosing a marble, (3) choosing a second marble. Note that 
if box A is selected and a red marble R is drawn and put into box B, then box B will have 3 red marbles 
and 5 white marbles. 


There are 4 paths which lead to 2 marbles of the same color; hence 
b3.3 T2203. 12 2. 1 51. 901 


=rrs'-d pias ie eS = 0.536 
3 258 254 273 2 7 2 = 1680 


Ww 
A 
4 R 
1 
ee 
3 w 
5 R 
; Ro 
4 Ww 
B 
1 


= 
\ 
> 


Fig. 4-12 


LAW OF TOTAL PROBABILITY, BAYES’ RULE 


4.21. In a certain city, 40 percent of the people consider themselves Conservatives (C), 35 percent 
consider themselves to be Liberals (L), and 25 percent consider themselves to be Independents 
(J). During a particular election, 45 percent of the Conservatives voted, 40 percent of the 
Liberals voted and 60% of the Independents voted. Suppose a person is randomly selected. 


(a) Find the probability that the person voted. 
(b) If the person voted, find the probability that the voter is 


(i) Conservative, (ii) Liberal, (iii) Independent. 
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(a) Let V denote the event that a person voted. We need P(V). By the law of total probability, 


P(V) = P(C)P(V|C) + P(L)P(V|L) + P)P(V|D) 
= (0.40)(0.45) + (0.35)(0.40) + (0.25)(0.60) = 0.47 


(b) Use Bayes’ rule: 
P(C)P(V|C) _ (0.40)(0.45) _ 18 


Oo ACy=— w) er 77 383% 
(ii) P(L|V) = Pe) ee = cae = : ~ 29.8% 
(iii) ~PU|V) = nore = cee) = = ~ 31.9% 


4.22. In a certain college, 4 percent of the men and 1 percent of the women are taller than 6 feet. 
Furthermore, 60 percent of the students are women. Suppose a randomly selected student is 
taller than 6 feet. Find the probability that the student is a woman. 


Let A = {students taller than 6 feet}. We seek P(W|A), the probability that a student is a woman, 
given that the student is taller than 6 feet. By Bayes’ formula, 


P(W) P(A|W) _ (0.60)(0.01) 3 


PWIA) P(W)P(A|W) + P(M)P(A|M) — (0.60)(0.01) + (0.40)(0.04) 11 


4.23. Three machines A, B, and C produce, respectively, 40 percent, 10 percent, and 50 percent of the 
items ina factory. The percentage of defective items produced by the machines is, respectively, 
2 percent, 3 percent, and 4 percent. An item from the factory is selected at random. 

(a) Find the probability that the item is defective. 
(b) If the item is defective, find the probability that the item was produced by: 
(i) machine A, 

(ii) machine B, 

(iii) machine C. 
(a) Let D denote the event that an item is defective. Then, by the law of total probability, 

P(D) = P(A)P(D|A) + P(B)P(D|B) + P(C)P(D|C) 
= (0.40)(0.02) + (0.10)(0.03) + (0.50)(0.04) = 0.031 = 3.1% 


(b) Use Bayes’ formula to obtain 


P(A)P(D|A) _ (0.40)(0.02) 8 


() PAID) =~) 0.031 — 
fy Peal) = ABPLALY _O10V08) _ 3 oy 
fi) CI0) = HOABIO _ 050000 5, 
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4.24. 


4.25. 


Suppose a student dormitory in a college consists of: 


(1) 40 percent freshmen of whom 15 percent are New York residents 
(2) 25 percent sophomores of whom 40 percent are New York residents 
(3) 20 percent juniors of whom 25 percent are New York residents 

(4) 15 percent seniors of whom 20 percent are New York residents 


A student is randomly selected from the dormitory. 


(a) Find the probability that the student is a New York resident. 


(b) If the student is a New York resident, find the probability that the student is a: 
(i) freshman, (ii) junior. 


Let A, B, C, D denote, respectively, the set of freshmen, sophomores, juniors, and seniors, and let E 
denote the set of students who are New York residents. 


(a) We find P(E) by the law of total probability. We have: 


P(E) = (0.40)(0.15) + (0.25)(0.40) + (0.20)(0.25) + (0.15)(0.20) 
0.06 + 0.10 + 0.05 + 0.03 = 0.24 = 24% 
(b) Use Bayes’ formula to obtain: 


P(A)P(E|A) _ (0.40)(0.15) _ 6 


(@) PAIE) =~ So 0.24 An 
ei hice POPES 7 OE) - S tan Gn 


A box contains 10 coins where 5 coins are two-headed, 3 coins are two-tailed, and 2 are fair 
coins. A coin is chosen at random and tossed. 


(a) Find the probability that a head appears. 
(b) Ifa head appears, find the probability that the coin is fair. 

Let X, Y, Z denote, respectively, the two-headed coins, the two-tailed coins, and the fair coins. Then 
P(X) = 0.5, P(Y) = 0.3, P(Z) = 0.2. Note P(H|X)=1, that is, a two-headed coin must yield a 


head. Similarly, P(H|Y) = 0 and P(H|Z) = 0.5. Figure 4-13 is a stochastic tree (with the root at the 
top) describing the given data. 


(a) By the law of total probability or by adding the probabilities of the three paths in Fig. 4-13 leading 
to H, we get 
P(A) = (0.5)(1) + (0.3)(0) + (0.2)(0.5) = 0.6 
(b) By Bayes’ rule, 
P(Z)P(H|Z) _ (0.2)(0.5) 1 


Ee P(H) 0.6 6 


= 16.7% 


106 


CONDITIONAL PROBABILITY AND INDEPENDENCE [CHAP. 4 


INDEPENDENT EVENTS 


4.26. 


4.27. 


Two men A and B fire at a target. Suppose P(A) = 3 and P(B) = § denote their probabilities 
of hitting the target. (We assume that the events A and B are independent.) Find the 
probability that: 


(a) A does not hit the target. (c) One of them hits the target. 
(b) Both hit the target. (d) Neither hits the target. 


(a) By the complement rule, 


2 
P(not A) = P(A’) = 1- P(A) =1 : 
(b) Since the events A and B are independent, 
I 
P(A and B) = P(ANB) = P(A) P(B) = 5-2 = 7 


(c) By the addition rule (Theorem 3.6), 


P(A or B) = P(A UB) = P(A) + P(B) — P(ANB) a+ = 


(d) By DeMorgan’s law, “neither A nor B” is the complement of AUB. [See Problem 3.1(d).] 
Hence 


7 8 
P(neither A nor B) = P((A U B)*) = 1— P(AUB) =1—- 645 


Box A contains 5 red marbles and 3 blue marbles and Box B contains 3 red and 2 blue. A 
marble is drawn at random from each box. 


(a) Find the probability p that both marbles are red. 
(b) Find the probability p that one is red and one is blue. 


(a) The probability of choosing a red marble from A is 3 and a red marble from B is 2. Since the events 
are independent, 


(b) There are two (mutually exclusive) events: 


X: ared marble from A and a blue marble from B 


Y: a blue marble from A and a red marble from B 


We have 


oe 33 9 
POs seo ¢ Foye == 
oo al ea = 3°50 


Accordingly, since X and Y are mutually exclusive, 


p= P(X) + P(Y) == 4 
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4.28. Let A be the event that a man will live 10 more years, and let B be the event that his wife lives 
10 more years. Suppose P(A) = 4 and P(B) =}. Assuming A and B are independent events, 
find the probability that, in 10 years 


(a) Both will be alive. (c) Neither will be alive. 
(b) At least one will be alive. (d) Only the wife will be alive. 
(a) Weseek P(ANMB). Since A and B are independent events, 
11 1 
PAN B)=PA)P@)=—°5=7, 


(b) We seek P(A UB). By the addition rule (Theorem 3.6), 


P(A UB) = P(A) + P(B) = PAN B) = 4 =- = 5 


(c) By DeMorgan’s law, “neither A nor B” is the complement of A U B. [Problem 3.1(b).] Hence 
1 1 
P(A‘ BY) = P(A U BY) = 1- PAUB)=1-5=5 


Alternately, we have P(A‘) =? and P(B‘) = 3; and, since A‘ and B° are independent, 


; 45 ede 
P(AS NB) ==-L == 
43 2 
(d) We seek P(A‘ B). Since A‘ and B are also independent, 
Sady 4d 
P(AS°NB) === == 
43 4 


4.29. Consider the following events for a family with children: 
A = {children of both sexes}, B = {at most one boy} 


(a) Show that A and B are independent events if a family has 3 children. 
(b) Show that A and B are dependent events if a family has only 2 children. 


(a) We have the equiprobable space S = {bbb, bbg, bgb, bgg, gbb, gbg, ggb, ggg}. Here 


6 3 
A = {bbg, bgb, bgg, gbb, gbg, ggb} and so P(A) = a - A 
4 1 
B = {bgg, gbg, gb, ggg} and so PR), 
3 
ANB = {bgg, gbg, ggb} and so P(ANB)=5 
Since P(A)P(B) = ¢-4=3%= P(ANB), A and B are independent. 
(b) We have the equiprobable space S = {bb, bg, gb, gg}. Here 
1 
A = {bg, gb} and so PAS, 
3 
B = {bg, gb, gg} and so P(B) = 3 
1 
A B = {bg, gb} and so P(ANB)= 5 


Since P(A)P(B) # P(A B), A and B are dependent. 
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4.31. 


4.32. 
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Three men A, B, C fire at a target. Suppose P(A) = 1/6, P(B) = 1/4, P(C) = 1/3 denote their 
probabilities of hitting the target. (We assume that the events that A, B, C hit the target are 
independent.) 


(a) Find the probability p that they all hit the target. 
(b) Find the probability p that they all miss the target. 
(c) Find the probability p that at least one of them hits the target. 


(a) Weseek PPANBNMC). Since A, B, C are independent events, 
P(AN BNC) = P(A): P(B): P(C) = === =—=14% 


(b) We seek P(ASN BNC’). We have P(A‘) =1-— P(A) =5/6. Similarly, P(B° = 3/4 and 
P(C‘) = 2/3. Since A, B, C are independent events, so are A‘, B°, C°. Hence 


P(ASN BSN C*) = P(A): P(B) + P(C) = sts =—=41.7% 
(c) Let D be the event that one or more of them hit the target. Then D is the complement of the event 
A°M B° OC’, that they all miss the target. Thus 
7 


5 
P(D) = P(ASN BSN CY) = 1 - ae: 58.3% 


Consider the data in Problem 4.30. (a) Find the probability p that exactly one of them hits the 
target. (b) If the target is hit only once, find the probability p that it was the first man A. 


(a) Let E be the event that exactly one of them hit the target. Then 
E=(ANBNC)U(ASN BNC) U(ASN BNC) 


That is, if only one man hit the target then it was only A, AN BSN C*, or only B, AN BNC’, 
or only C, ASA BSNC. These three events are mutually exclusive. Thus, we obtain (using 
Problem 4.79) 


p = P(E) = PAN BNC) + ASN BNC) + (ASN BNC) 


$30 802 S537 G.. & , 53 
ee ee ie ee Bead 43.1% 
643 643 643 12 36 24 72 


(b) We seek P(A|E), the probability that A hit the target given that only one man hit the target. Now 
AN E=ANB°NC* is the event that only A hit the target. Also, by (a), P(A NM £) = 1/12 and 
P(E) = 31/72; hence 


P(ANE)_ 1/12 _ 6 


= 19.4% 
P(E) 31/7231 


P(A|E) = 


Let S = {a, b,c, d} be an equiprobable space; hence each elementary event has probability 1/4. 
Consider the events: 


A= {a,d}, B={bd}, C= {c,d} 
(a) Show that A, B, C are pairwise independent. 
(b) Show that A, B, C are not independent. 
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(a) Here P(A) = P(B) = P(C) = 1/2. Since AN B = {d}, 


P(A B) = Pld) = = PLA)PCB) 


Hence A and B are independent. Similarly, A and C are independent and B and C are 
independent. 


(b) Here AN BNC = {d}, and so PAM BNC) = 1/4. Therefore 
1 
P(A)P(B)P(C) = ¢ # PAN BNC) 


Accordingly, A, B, C are not independent. 


4.33. Suppose S = {1,2,3,4,5,6, 7,8} is an equiprobable space; hence each elementary event has 
probability 1/8. Consider the events: 


A = {1,2,3, 4}, B= (2,3, 4, 5}, C = {4, 6, 7, 8} 
(a) Show that P(AN BOC) = P(A)P(B)P(C). 
(b) Show that 
(i) P(ANB) # P(A)P(B), 
(ii) P(ANC) # P(A)P(C), 
(iii) P(B OC) # P(B)P(C). 
(a) Here P(A) = P(B) = P(C) = 4/8 = 1/2. Since ANBNC = {4}, 


P(ANBNC)= ‘ = P(A)P(B)P(C) 
(b) (i) ANB=({3,4,5}, so P(ANB) =3/8. But P(A)P(B) = 1/4; hence P(A M B) # P(A)P(B). 


(ii) ANC = {4}, so (ANC) = 1/8. But P(A)P(C) = 1/4; hence P(A NC) # P(A)P(C). 
(iii) BOC = {4}, so (BNC) = 1/8. But P(B)P(C) = 1/4; hence P(B NC) # P(B)P(C). 


4.34. Prove: Suppose A and B are independent events. Then A‘ and B° are independent events. 


We need to show that P(A° MN B°) = P(A‘): P(B‘). Let P(A) = xandP(B) =y. Then P(A) =1-~x 
and P(B‘)=1-—y. Since A and B are independent, P(A MB) = P(A): P(B)=xy. Thus, by the 
addition rule (Theorem 3.6), 


P(A U B) = P(A) + P(B) - PAN B)=x+y-xy 
By DeMorgan’s law, (A U B)° = A°N BY‘; hence 
P(ASN BY) = P(A U BY) =1—- PAU B)=1-x-y+xy 
On the other hand, 


P(A): P(BY) = (1 x)(1-y) =1-x-y txy 


Thus, P(A° NM B°) = P(A‘) - P(B‘), and so A‘ and B* are independent. 
Similarly, one can show that A and BY’, as well as A‘ and B, are independent. 


INDEPENDENT REPEATED TRIALS 


4.35. A fair coin is tossed three times. Find the probability that there will appear: 
(a) three heads, (b) exactly two heads, (c) exactly one head, (d) no heads. 
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Let H denote a head and 7 a tail on any toss. The three tosses can be modeled as an equiprobable 
space in which there are eight possible outcomes: 
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


However, since the result on any one toss does not depend on the result of any other toss, the three tosses 
may be modeled as three independent trials in which P(H) = 3 and P(T) = 5 on any one trial. Then 
(a) P (three heads) = P(HHH) =3+4-4=}4 

(b) P (exactly two heads) = P(HHT or HTH or THH) =5+5+54+4-°3°3+5°4°4 =3 

(c) As in (b), P (exactly one head) = P (exactly two tails) = 3 

(d) As in (a), P (no heads) = P(TTT) = 


4.36. Suppose only horses a, b, c, d race together yielding the sample space S = {a,b,c,d}, and 
suppose the probabilities of winning are as follows: 


P(a) = 0.2, P(b) = 0.5, P(c) = 0.1, P(d) =0.2 
They race three times. 
(a) Describe and find the number of elements in the product probability space $3. 
(b) Find the probability that the same horse wins all three races. 
(c) Find the probability that a, b, c each wins one race. 
(a) By definition, S$; = S x S X S = {(x, y, z): x,y, z € S} and 
P((x, y, Z) = P(x) P(y)P(2)) 
Thus, in particular, 53 contains 4° = 64 elements. 


(b) Writing xyz for (x, y, z), we seek the probability of the event 
A = {aaa, bbb, ccc, ddd} 
By definition 
P(aaa) = (0.2)? = 0.008, P(ccc) = (0.1)? = 0.001 
P(bbb) = (0.5)? = 0.125, P(ddd) = (0.2)? = 0.008 
Thus, P(A) = 0.0008 + 0.125 + 0.001 + 0.008 = 0.142. 
(c) We seek the probability of the event 


B = {abc, acb, bac, bca, cab, cba} 
Every element in B has the same probability (0.2)(0.5)(0.1) = 0.01. Hence P(B) = 6(0.01) = 0.06. 


4.37. A certain soccer team wins (W) with probability 0.6, loses (L) with probability 0.3, and ties (T) 
with probability 0.1. The team plays three games over the weekend. (a) Determine the 
elements of the event A that the team wins at least twice and does not lose; and find P(A). 


(b) Determine the elements of the event B that the team wins, loses, and ties in some order; and 
find P(B). 


(a) A consists of all ordered triples with at least two W’s and no L’s. Thus 
A = {WWW, WWT, WTW, TWW} 
Since these events are mutually exclusive, 
P(A) = P(WWW) + P(WWT) + P(WTW) + P(TWW) 
= (0.6)(0.6)(0.6) + (0.6)(0.6)(0.1) + (0.6)(0.1)(0.6) + (0.1)(0.6)(0.6) 
0.216 + 0.36 + 0.36 + 0.36 = 0.324 = 32.4% 
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4.38. 


4.39. 


4.40. 


(b) Here B={WLT,WTL, LWT, LTW,TWL,TLW}. Each element in B has the probability 
(0.6)(0.3)(0.1) = 0.018. Hence 


P(B) = 6(0.108) = 0.108 = 10.8% 


A certain type of missile hits its target with probability p = 0.3. Find the minimum number n 
of missiles that should be fired so that there is at least an 80 percent probability of hitting the 
target. 


The probability of missing the target is g = 1 — p = 0.7. Hence the probability that n missiles miss 
the target is (0.7)".. Thus, we seek the smallest n for which 


1 — (0.7)” > 0.80 or equivalent (0.7)" < 0.20 
Compute: 
(0.7)' = 0.7, (0.7)? = 0.49, (0.7) = 0.343, (0.7)* = 0.2401, (0.7)° = 0.16807 


Thus, at least n = 5 missiles should be fired. 


The probability that a man hits a target is 1/3. He fires at the target n = 6 times. (a) Describe 
and find the number of elements in the sample space S. (b) Let E be the event that he hits the 
target exactly k =2 times. List the elements of FE and find the number n(E) of elements in 
E. (c) Find P(E). 
(a) S consists of all 6-element sequences consisting of S’s (successes) and F’s (failures); hence S contains 
2° = 64 elements. 
(b) E consists of all sequences with two S’s and four F’s; hence E consists of the following elements: 
SSFFFF, SFSFFF, SFFSFF, SFFFSF, SFFFFS, FSSFFF, FSFSFF, FSFFSF, 
FSFFFS, FFSSFF, FFSFSF, FFSFFS, FFFSSF, FFFSFS, FFFFSS 
Observe that the list contains 15 elements. [This is expected since we are distributing k = 2 letters 
S among the n = 6 positions in the sequence, and C(6, 2) = 15.] Thus n(£) = 15. 


(c) Here P(S) = 1/3,so P(F) = 1 — P(S) = 2/3. Thus each of the above sequences occurs with the same 
probability 


16 
= (1/3)? (2/3}4 = — 
p = (13) (23) = =o 
Hence P(E) = 15(16/729) = 80/243 ~ 33%. 


Let S be a finite probability space and let T be the probability space of n independent trials in 
S. Show that T is well defined. That is, show 

(i) The probability of each element of T is nonnegative. 

(ii) The sum of their probabilities is 1. 


Suppose S = {a,, a, ..., a,}. Then each element of T is of the form 


aj, i *** A; 


tn 


ise where i, b,...,i,€ {1,2,..., 7} 
Since each P(a;) = 0, we have 
P(aj,4;,*** a;,) = P(a;,)P(ai,) ++ P(a;,) = 0 


for every element of T. Hence (i) holds. 
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We prove (ii) by induction on n. It is obviously true forn =1. Therefore, we consider n > 1 and 
assume (ii) has been proved forn —1. We have 


r 


Dd Plaids a1,) =D) Pla)PCai)-- Pla) =) Pay) Plain) + PCa...) >, PCA) 
it, ..,i,=1 ty. inet Lijncsig tt i,=1 


r 


a >, P(a;,) P(Gi,) oe P(G,,_,) = 


hincetea=1 


where the last equality follows from the inductive hypothesis. Thus, (ii) also holds. 


Supplementary Problems 


CONDITIONAL PROBABILITY 
4.41. A fair die is tossed. Consider events A = {1, 3,5}, B = {2,3,5}, C = {1,2,3,4}. Find: 
(a) P(ANB) and P(AUC) (c) P(A|C) and P(C|A) 
(b) P(A|B) and P(B|A) (d) P(B|C) and P(C|B) 
4.42. A digit is selected at random from the digits 1 through 9. Consider the events A = {1,3,5,7, 9}, 
B = {2,3,5,7}, C = {6,7, 8,9}. Find: 
(a) P(ANB) and P(AUC) (c) P(A|C) and P(C|A) 
(b) P(A|B) and P(B|A) (d) P(B|C) and P(C|B) 


4.43. A pair of fair dice is tossed. If the faces appearing are different, find the probability that: 


(a) the sum is even, (b) the sum exceeds nine. 


4.44, Let A and B be events with P(A) = 0.6, P(B) = 0.3, and P(A NB) = 0.2. Find: 
(a) P(A UB), (b) P(A|B), (c) P(BIA). 


4.45. Referring to Problem 4.44, find: (a) P(A N B‘), (b) P(A|B*). 


4.46. Let A and B be events with P(A) = 3, P(B) = 4, and P(A U B) = 3. 
(a) Find P(A|B) and P(B|A). (b) Are A and B independent? 


4.47, A woman is dealt 3 spades from an ordinary deck of 52 cards. (See Fig. 3-4.) If she is given two more 
cards, find the probability that both of the cards are also spades. 


4.48. Two marbles are selected one after the other without replacement from a box containing 3 white marbles 
and 2 red marbles. Find the probability p that: 


(a) The two marbles are white. (c) The second is white if the first is white. 
(b) The two marbles are red. (d) The second is red if the first is red. 


4.49. Two marbles are selected one after the other with replacement from a box containing 3 white marbles and 
2 red marbles. Find the probability p that: 


(a) The two marbles are white. (c) The second is white if the first is white. 
(b) The two marbles are red. (d) The second is red if the first is red. 
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4.50. 


4.51. 


4.52. 


4.53. 


4.54. 


4.55. 


Two different digits are selected at random from the digits 1 through 5. 


(a) If the sum is odd, what is the probability that 2 is one of the numbers selected? 
(b) If 2 is one of the digits, what is the probability that the sum is odd? 


Three cards are drawn in succession (without replacement) from a 52-card deck. Find the probability 
that: 


(a) There are three aces. 
(b) If the first is an ace, then the other two are aces. 


(c) If the first two are aces, then the third is an ace. 
A die is weighted to yield the following probability distribution: 


Number | 1 2 3 4 5 6 


Probability | 0.2 0.1 0.1 0.3 0.1 0.2 


Let A = {1,2,3}, B = {2,3, 5}, C = {2,4,6}. Find: 

(a) P(A), P(B), P(C) (d) P(A|C), P(C\A) 
(b) P(A‘), P(B), P(C*) (e) P(B|C), P(C|B) 
(c) P(A|B), P(B|A) 


In a country club, 65 percent of the members play tennis, 40 percent play golf, and 20 percent play both 
tennis and golf. A member is chosen at random. Find the probability that the member: 

(a) Plays tennis or golf. (c) Plays golf if he or she plays tennis. 

(b) Plays neither tennis nor golf. (d) Plays tennis if he or she plays golf. 


Suppose 60 percent of the freshmen class of a small college are women. Furthermore, suppose 25 percent 
of the men and 10 percent of the women in the class are studying mathematics. A freshman student is 
chosen at random. Find the probability that: 

(a) The student is studying mathematics. 


(b) If the student is studying mathematics, then the student is a woman. 


Three students are selected at random one after another from a class with 10 boys and 5 girls. Find the 
probability that: 

(a) The first two are boys and the third is a girl. 

(b) The first and third are boys and the second is a girl. 

(c) All three are of the same sex. 

(d) Only the first and third are of the same sex. 


FINITE STOCHASTIC PROCESSES 


4.56. 


Two boxes are given as follows: 


Box A contains 5 red marbles, 3 white marbles, and 8 blue marbles. 


Box B contains 3 red marbles and 5 white marbles. 


A box is selected at random and a marble is randomly chosen. Find the probability that the marble is: 
(a) red, (b) white, (c) blue. 
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4.57. 


4.58. 


4.59. 


4.60. 


4.61. 


4.62. 
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Refer to Problem 4.56. Find the probability that box A was selected if the marble is: 
(a) red, (b) white, (c) blue. 


Consider Box A and Box B in Problem 4.56. A fair die is tossed; if a 3 or 6 appears, a marble is randomly 
chosen from A, otherwise a marble is chosen from B. Find the probability that the marble is: 


(a) red, (b) white, (c) blue. 


Refer to Problem 4.58. Find the probability that box A was selected if the marble is: 
(a) red, (b) white, (c) blue. 


A box contains three coins, two of them fair and one two-headed. A coin is randomly selected and tossed 
twice. If heads appear both times, what is the probability that the coin is two-headed? 


A box contains a fair coin and a two-headed coin. A coin is selected at random and tossed. If heads 
appears, then the other coin is tossed; if tails appears, then the same coin is tossed a second time. Find 
the probability that: 

(a) Heads appears on the second toss. 


(b) If heads appears on the second toss, then it also appeared on the first toss. 


Two boxes are given as follows: 
Box A contains x red marbles and y white marbles. 
Box B contains z red marbles and ¢ white marbles. 


(a) A box is selected at random and a marble is drawn. Find the probability that the marble is red. 


(b) A marble is selected from A and put into B, and then a marble is drawn from B. Find the 
probability that the marble is red. 


TOTAL PROBABILITY AND BAYES’ FORMULA 


4.63. 


4.64. 


4.65. 


A city is partitioned into districts A, B, C having 20 percent, 40 percent, and 40 percent of the registered 
voters, respectively. The registered voters listed as Democrats are 50 percent in A, 25 percent in B, and 
75 percent in C. A registered voter is chosen randomly in the city. 

(a) Find the probability that the voter is a listed Democrat. 


(b) If the registered voter is a listed Democrat, find the probability that the voter came from 
district B. 


Refer to Problem 4.63. Suppose a district is chosen at random, and then a registered voter is randomly 
chosen from the district. 

(a) Find the probability that the voter is a listed Democrat. 

(b) If the voter is a listed Democrat, what is the probability that the voter came from district A? 


Women in City College constitute 60 percent of the freshmen, 40 percent of the sophomores, 40 percent 
of the juniors, and 45 percent of the seniors. The school population is 30 percent freshmen, 25 percent 
sophomores, 25 percent juniors, and 20 percent seniors. A student from City College is chosen at 
random. 

(a) Find the probability that the student is a woman. 


(b) Ifa student is a woman, what is the probability that she is a sophomore? 
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4.66. 


4.67. 


4.68. 


Refer to Problem 4.65. Suppose one of the four classes is chosen at random, and then a student is 
randomly chosen from the class. 
(a) Find the probability that the student is a woman. 


(b) If the student is a woman, what is the probability that she is a sophomore? 


A company produces lightbulbs at three factories A, B, C. 


Factory A produces 40 percent of the total number of bulbs, of which 2 percent are defective. 
Factory B produces 35 percent of the total number of bulbs, of which 4 percent are defective. 


Factory C produces 25 percent of the total number of bulbs, of which 3 percent are defective. 


A defective bulb is found among the total output. Find the probability that it came from 
(a) factory A, (b) factory B, (c) factory C. 


Refer to Problem 4.67. Suppose a factory is chosen at random, and one of its bulbs is randomly 
selected. If the bulb is defective, find the probability that it came from (a) factory A, (b) factory B, (c) 
factory C. 


INDEPENDENT EVENTS 


4.69. 


4.70. 


4.71. 


4.72. 


4.73. 


4.74. 


4.75. 


4.76. 


Let A and B be independent events with P(A) = 0.3 and P(B) = 0.4. Find: (a) P(A NM B) and P(A U B), 
(b) P(A|B) and P(B|A). 


Box A contains 5 red marbles and 3 blue marbles and Box B contains 2 red and 3 blue. A marble is drawn 
at random from each box. Find the probability p that (a) Both marbles are red. (b) One is red and one 
is blue. 


Box A contains 5 red marbles and 3 blue marbles and Box B contains 2 red and 3 blue. Two marbles are 
drawn at random from each box. Find the probability p that (a) They are all red. (b) They are all the 
same color. 


Let A and B be independent events with P(A) = 0.2 and P(B) = 0.3. Find: 
(a) P(AMB) and P(AUB) (c) P(A|B) and P(B|A) 
(b) P(AM B®) and P(A U B*) (d) P(A|B°) and P(B‘|A) 


Let A and B be events with P(A) = 0.3, P(A U B) = 0.5, and P(B) =p. Find p if: 
(a) A and B are disjoint, (b) A and B are independent, (c) A is a subset of B. 


The probability that A hits a target is 1/4 and the probability that B hits a target is 1/3. They each fire 
once at the target. Find the probability that 

(a) They both hit the target. 

(b) The target is hit exactly once. 

(c) If the target is hit only oonce, then A hit the target. 


The probability that A hits a target is 1/4 and the probability that B hits a target is 1/3. They each fire 
twice. Find the probability that the target is hit: (a) at least once, (b) exactly once. 


The probabilities that three men hit a target are, respectively, 0.3, 0.5, and 0.4. Each fires once at the 
target. (As usual, assume that the three events that each hits the target are independent.) 

(a) Find the probability that they all: (i) hit the target, (ii) miss the target. 

(b) Find the probability that the target is hit: (i) at least once, (ii) exactly once. 

(c) If only one hits the target, what is the probability that it was the first man? 
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4.77. 


4.78. 


4.79. 


4.80. 
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Three fair coins are tossed. Consider the events: 
A = {all heads or all tails}, B = {at least two heads}, C = {at most two heads} 
Of the pairs (A, B), (A, C), and (B, C), which are independent? 


Suppose A and B are independent events. Show that A and B° are independent, and that A° and B are 
independent. 


Suppose A, B, C are independent events. Show that: 
(a) A‘, B, C are independent; (b) A‘, B°, C are independent; (c) A‘, B°, C° are independent. 


Suppose A, B, C are independent events. Show that A and B U C are independent. 


INDEPENDENT REPEATED TRIALS 


4.81. 


4.82. 


4.83. 


4.84. 


4.85. 


4.86. 


4.41 


4.42. 


4.43. 


4.44, 


Whenever horses a, b, c race together, their respective probabilities of winning are 0.3, 0.5,0.2. They race 
three times. 


(a) Find the probability that the same horse wins all three races. 

(b) Find the probability that a, b, c each wins one race. 

A team wins (W) with probability 0.5, loses (L) with probability 0.3, and ties (7) with probability 0.2. The 
team plays twice. (a) Determine the sample space S and the probability of each elementary event. (b) 
Find the probability that the team wins at least once. 

A certain type of missile hits its target with probability p = 4. (a) If 3 missiles are fired, find the 
probability that the target is hit at least once. (b) Find the minimum number x of missiles that should be 


fired so that there is at least a 90 percent probability of hitting the target. 


In any game, the probability that the Hornets (7) will defeat the Rockets (R) is 0.6. Find the probability 
that the Hornets will win a best-out-of-three series. 


The batting average of a baseball player is .300. He comes to bat 4 times. Find the probability that he 
will get: (a) exactly two hits, (b) at least one hit. 


Consider a countably infinite probability space S = {a,, a, ...}. Let 
T = S" = {(81, 52, ..-, 8,):5;E S} and P(s1, 52, ..., Sn») = P(s1)P(S2) +++ P(Sn) 


Show that T is also a countably infinite probability space. (This generalizes the definition of independent 
trials to a countably infinite space.) 


Answers to Supplementary Problems 
(a) 2/6, 5/6; (b) 2/3, 2/3; (c) 1/2, 2/3; (d) 1/2, 2/3. 
(a) 3/9, 7/9; (b) 3/4, 3/5; (c) 1/2, 2/5; (d) 1/4, 1/4. 
(a) 12/30; (b) 4/30. 


(a) 0.7; (b) 2/3; (c) 1/3. 
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4.45. (a) 0.4; (b) 4/7. 

4.46. (a) 1/3; 1/4; (b) No. 

4.47. C(10,2)/C(49, 2). 

4.48. (a) 3/10; (b) 1/10; (c) 1/2; (d) 1/4. 

4.49, (a) 9/25; (b) 4/25: (c) 3/5; (d) 2/5. 

4.50. (a) 1/3; (b) 3/4. 

4.51. (a) 1/(13+ 17-25) = 0.014%; (b) 1/1275 = 0.08%; (c) 1/50 = 2%. 
4.52. (a) 0.4, 0.3, 0.6; (b) 0.6, 0.7, 0.4; (c) 2/3, 1/2; (d) 1/6, 1/4; (e) 1/6, 1/3. 
4.53. (a) 85%; (b) 15%; (c) 20/65 ~ 30.1%; (d) 1/2 = 50%. 

4.54, (a) 16%: (b) 6/16 = 37.5%. 

4.55. (a) 15/91 ~ 16.5%; (b) 15/91 ~ 16.5%; (c) 5/21 ~ 23.8%. 

4.56. (a) 11/32; (b) 13/32; (c) 8/32. 

4.57. (a) 5/11; (b) 3/13; (c) 1. 

4.58. (a) 17/48 ~ 35.4%; (b) 23/48 ~ 47.9%; (c) 8/48 ~ 16.7%. 

4.59. (a) 5/17 ~ 29.4%: (b) 3/23 ~ 13.0%; (c) 1. 

4.60. 2/3. 

4.61. (a) 5/8; (b) 4/5. 

4.62. (a) 3(=45 + Fh); (0) WETTED: 

4.63. (a) 50%: (b) 20%. 

4.64. (a) 50%; (b) 1/3. 

4.65. (a) 47%; (b) 10/47 ~ 21.3%. 

4.66. (a) 46.25%; (b) 21.6%. 

4.67. (a) 80/295 ~ 27.1%; (b) 140/295 ~ 47.5%; (c) 75/295 = 25.574%. 
4.68. (a) 2/9; (b) 4/9; (c) 3/9. 

4.69. (a) 0.12, 0.58; (b) 0.3, 0.4. 

4.70. (a) 1/4; (b) 21/40. 

4.71. (a) 1/28; (b) 1/28 + 9/280 = 19/280. 
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4.72. 


4.73. 


4.74. 


4.75. 


4.76. 


4.77. 


4.81. 


4.82. 


4.83. 


4.84. 


4.85. 
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(a) 0.06, 0.44; (b) 0.14, 0.76; (c) 0.25, 0.30; (d) 0.20, 0.80. 
(a) 0.2; (b) 2/7; (c) 0.5. 


(a) 1/12; (b) 5/12; (c) 2/5. 


@ 1-4-0) 6ta=o 
(a) 6%, 21%; (b) 79%, 44%; (c) 9/44 ~ 20.45%. 
Only A and B are independent. 


(a) P(aaa or bbb or ccc) = 0.26; (b) 6(0.03) = 0.18. 


[CHAP. 4 


(a) S={WW, WL, WT, LW, LL, LT, TW, TL, TT}; 0.25, 0.15, 0.10, 0.15, 0.09, 0.06, 0.10, 0.06, 0.04; 


(b) 1 — 0.25 = 0.75. 
(a) 1 — (2/3) = 19/27; (b) (2/3)" <10% son>6. 
P(HH or HRH or RHH) = 64.8%. 


(a) 6(0.44) = 26.5%; (b) 1 - PUMMMM) ~ 76%. 
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Random Variables 


5.1 INTRODUCTION 


Random variables play an important role in probability. This chapter formally defines a random 
variable and presents its basic properties. The next chapter treats special types of random 
variables. 

A random variable is a special kind of a function, so we recall some notation and definitions about 
functions. Let S and T be sets. Suppose to each s € S there is assigned a unique element of T; the 
collection f of such assignments is called a function from S into T, and it is written 


f:S-T 


We write f(s) for the element of T that f assigns to s € S, and f(s) is called the image of s under f or 
the value of f ats. The image f(A) of any subset A of S, and the pre-image f '(B) of any subset B 
of T are defined as follows: 


f(A) = {fs):s ES} and sf "(B) = {s: f(s) © B} 


In words, f(A) consists of the images of the points in A, and f-'(B) consists of those points in S whose 
image belongs to B. In particular, the set f(S) of all the image points of elements in S is called the 
image set (or image or range) of the function f. 


5.2 RANDOM VARIABLES 


Let S be a sample space of an experiment. As noted previously, the outcome of the experiment, 
or the points in S, need not be numbers. For example, in tossing a coin, the outcomes are H (heads) 
or T (tails), and in tossing a pair of dice, the outcome are pairs of integers. However, we frequently 
wish to assign a specific number to each outcome of the experiment. For example, in the tossing of 
a pair of dice, we may want to assign the sum of the two integers to the outcome. Such an assignment 
of numerical values to the points in S is called a random variable. Specifically, we have the following 
definition. 


Definition: A random variable X on a sample space S is a function from S into the set R of real 
numbers such that the pre-image of any interval of R is an event in S. 


We emphasize that if S is a discrete sample space in which every subset of S is an event, then 
clearly every real-valued function on S is a random variable. On the other hand, if S is uncountable, 
then it can be shown that certain real-valued functions on § are not random variables. 
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The notation Ry will be used to denote the image of a random variable _X, that is, Ry is the set of 
those numbers assigned by X to a sample space S. We will refer to Ry as the range space of X. This 
chapter will mainly investigate discrete random variables, where the range space Ry is finite or 
countable. Continuous random variables are those where the range space Ry is a continuum of 
numbers such as an interval or a union of intervals. Such random variables, which may require some 
calculus for their investigation, will be treated near the end of the chapter. 


EXAMPLE 5.1 A pair of fair dice is tossed. (See Example 3.2.) The sample space S consists of the 36 ordered 
pairs (a, b) where a and b can be any integers between 1 and 6, that is, 


S = {(1,1), (4,2), ..., (6, 6)} 
Let X assign to each point (a, b) the maximum of its numbers, that is, X(a, b) = max(a,b). For example, 
X(1,1) = 1, X(3, 4) = 4, X(5,2) =5, X(6,6) = 6 


Then X is a random variable where any number between 1 and 6 could occur, and no other number can 
occur. Thus, the range space Ry of X is as follows: 


Ry = {1,2,3, 4,5, 6} 
Now let Y assign to each point (a, b) the sum of its numbers, that is, Y(a,b) =a+b. For example, 
Y(1, 1) =2, Y(3, 4) =7, Y(6,3) = 9, Y(6, 6) = 12 


Then, Y is a random variable where any number between 2 and 12 could occur, and no other number can 
occur. Thus, the range space Ry of Y is as follows: 


Ry = {2,3, 4,5, 6, 7, 8, 9, 10, 11, 12} 


EXAMPLE 5.2 


(a) A box contains 12 items of which 3 are defective. A sample of 3 items is selected from the box. The 
sample space S consists of the C(12,3) = 220 different samples of size 3. Let X denote the number of 
defective items in the sample; then X is a random variable with range space Ry = {0, 1, 2, 3}. 


(b) A coin is tossed until a head occurs. The sample space follows: 
S = {H, TH, TTH, TTTH, TTTTH, ...} 
Let X denote the number of times the coin is tossed. Then, X is a random variable with range space 
Ry = {1,2,3,4, ..., ©} 


(We include the number © for the case that only tails occurs.) Here X is an infinite but discrete random 
variable. 


(c) A point is chosen in a circle C of radius r. Let X denote the distance of the point from the center. Then, 
X is a random variable whose value can be any number between 0 and r, inclusive. Thus, the range space 
Ry of X is a closed interval: 


Ry = [0,7] = («:05xe7} 


Here, X is a continuous random variable. 
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Sums and Products of Random Variable 


Let X and Y be a random variable on the same sample space S. Then X¥ + Y,X +k, kX, and XY 
(where k is a real number) are the functions on S defined as follows (where s is any point in S): 


(X + Y)(s) = X(s) + V8), (KX)(5) = KX(5), 
(X + k\(s) = X(s) +k, (XY)(s) = X(s)¥(s) 


More generally, for any polynomial, exponential, or continuous function h(t), we define h(X) to be the 
function on S defined by 


[A(X)](s) = ALX(s)] 
One can show that these are also random variables on S. (This is trivially true for the case that every 
subset of S is an event.) 
We use the short notation P(X = a) and P(a <= X Sb), respectively, for the probability that “X 
maps into a” and “X maps into the interval [a, b]’’. That is, 
P(X =a) is short for P({sE&S:X(s) =a} 
PiasX=<b)  isshortfor P({sES:a< X(s) =}b} 
Analogous meanings are given to 


P(X Sa), P(X =a, Y =b), PiasXsb),c=Y<d) 


and so on. 


5.33. PROBABILITY DISTRIBUTION OF A FINITE RANDOM VARIABLE 


Let X be a finite random variable on a sample spacea S, that is, X assigns only a finite number of 
values to S$. Say, 


Ry = (x1, X25 0 895 Xn} 


We assume that x, <x,<--:<x,.) Then, X induces a function f which assigns probabilities to the 
gns p 
points in Ry as follows: 


f(x) = P(X = x,) = P({s ES: X(s) = xx} 
The set of ordered pairs [x;, f(x,)] is usually given in the form of a table as follows: 


Xx | Xy X2 x3 NAlisiaes Xn 


fx) | fe). fey Fe Fe) 


This function f is called the probability distribution or, simply, distribution, of the random variable X; 
it satisfies the following two conditions: 


(i) fa)=0 and (ii) SY) flex) =1 
k 
Accordingly, the range space Ry with the above assignment of probabilities is a probability space. 


Remark: It is convenient sometimes to extend a probability distribution f to all real numbers by 
defining f(x) = 0 when x does not belong to Ry. A graph of such a function f(x) is called a probability 
graph. 

Notation: Sometimes a probability distribution will be given using the pairs [;, p,] or [x,, P(x;)] or 
[x, P(X = x)] rather than the functional notation [x, f(x)]. 
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Equiprobable Spaces 


Now suppose_X is a random variable on a finite equiprobable space S. Then, X is a finite random 
variable, and the following theorem tells us how to obtain the distribution of X: 


Theorem 5.1: Let S be a finite equiprobable space, and let X be a random variable on S with range 
space Ry = {x,, X2,...,x,}.. Then 


number of points in S whose image is x, 


= P(x,) = 
Pr (Xx) number of points in S$ 


The proof appears in Problem 5.41. It essentially follows from the fact that S$ is an equiprobable 
space, and hence 
n({s: X(s) = x,}) 
n(S) 


Du = Pls: X(s) = x4} = 
We apply this theorem in the next examples. 


EXAMPLE 5.3 Let S be the sample space when a pair of fair dice is tossed. Then S is a finite equiprobable 
space consisting of the 36 ordered pairs (a, b) where a and b are any integers between 1 and 6: 


S = {(1,1), (4,2), (1,3), .-., (6, 6} 


Let X and Y be the random variables on S in Example 5.1, that is, X denotes the maximum of the numbers, 
X(a, b) = max(a, b), and Y denotes the sum of the numbers, Y(a, b) = a+ b. 


(a) Find the distribution f of X. (b) Find the distribution g of Y. 


Also, exhibit their probability graphs. 
Here S is an equiprobable space with 36 points so we can use Theorem 5.1 and simply count the number of 
points with the given numerical value. 


(a) Random Variable X. We compute the distribution f of X as follows: 
(1) Only one toss (1,1) has the maximum value 1; hence f(1) = 3. 
(2) Three tosses, (1,2), (2,2), (2,1), have the maximum value 2; hence f(2) = #. 
(3) Five tosses, (1,3), (2,3), (3,3), (3,2), (3,1), have the maximum value 3; hence f(3) = =. 


7 9 11 
Similarly, f(4) = 36? f(5) = 36° f(6) = a6 Thus, the distribution f of X is as follows: 
x | 1 2 3 4 5 6 
f(x) | 36 36 36 %6 36 36 


The probability graph of X is pictured in Fig. 5-1(a). 


(b) Random Variable Y. The distribution g of the random variable Y is as follows: 


y | 2 3 4 5 6 7 8 9 10 11 12 
ay) | a 3 3% 36 35 % 3 36 3 % 36 


We obtain, for example, g(6) = + from the fact that exactly five of the tosses have sum 6: 
(1,5), (2, 4), (3,3), (4, 2), (5, 1) 
The other entries are obtained similarly. The probability graph of Y is pictured in Fig. 5-1(b). 
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EXAMPLE 5.4 Suppose a fair coin is tossed three times yielding the following sample space: 
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


Let X be the random variable which assigns to each point in S the number of heads. Then clearly X can only 
be 0,1, 2, or 3. That is, the following is its range space: 


Ry = {0,1, 2, 3} 


Observe that: 


(i) There is only one point TTT where X = 0. 

(ii) There are three points, HTT, THT, TTH, where X = 1. 
(iii) There are three points, HHT, HTH, THH, where X = 2. 
(iv) There is only one point HHH where X = 3. 


Since the coin is fair, S is an 8-element equiprobable space. Hence Theorem 5.1 tells us that the distribution f 
of X is as follows: 


x 0 1 2. 3 
fx) § 3 3 8 


EXAMPLE 5.5 Suppose a coin is tossed three times, but now suppose the coin is weighted so that P(H) = § and 
P(T) =4. The sample space is again 


S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 


Let X be the random variable which assigns to each point in S its number of heads. Find the distribution f of X. 
Now S is not an equiprobable space. Specifically, the probabilities of the points in S are as follows: 


8 4 4 2 

P(HHH) = — P(HHT) = — P(HTH) = — P(HTT) = — 
(HHH) = 5... (HHT) = 5. (HTH) = 5, (HTT) = 5 
P(THH) = + P(THT) = = P(ITH) = 2 P(TTT) = + 
oF 27° oT oF 


Since S is not an equiprobable space, we cannot use Theorem 5.1 to find the distribution fof X. We find 
f directly by using its definition. Namely, 
2° de 2 56 


1 
fO)= PEIN) =: (1S 2qHTL THT TB) = ta 
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4.4.4 2 
a OT oF IF 


f(2) = P(HHT, HTH, THH}) f@) = P(HHH) = = 


Thus, the distribution f of X is as follows: 
x 0 1 2 3 


f(x) Oa a i a 


The probability graph of f is shown in Fig. 5-2(a). An alternate picture of fis by a histogram which appears in 
Fig. 5-2(b). One may view the histogram as making the random variable continuous where X = 1 means X lies 
between 0.5 and 1.5. 


ne 
a s) 


ex 


4 
2? 


Fig. 5-2 


5.4 EXPECTATION OF A FINITE RANDOM VARIABLE 


Let X be a finite random variable, and suppose the following is its distribution: 


Xx | xX, X2 x3 = Xn 


fx) | fo) fe) fe) «= 76) 


Then the mean, or expectation (or expected value) of X, denoted by E(X), or simply E, is defined by 
E = E(X) = x1 fs) + x2flt2) + +++ + xn fn) = >) xif(xi) 
Equivalently, when the notation [x,, p;] is used instead of [x, f(x)], 
E = E(X) = x1pit %2pr2t +++ + XnPn = » an 
Roughly speaking, if the x, are numerical outcomes of an experiment, then E is the expected value of 


the experiment. We may also view E as the weighted average of the outcomes where each outcome 
is weighted by its probability. (For notation convenience we omit the limits of the index in the above 


summations.) 


www.ebook3000.com 


CHAP. 5] RANDOM VARIABLES 125 


EXAMPLE 5.6 A pair of fair dice are tossed. Let X and Y be the random variables in Example 5.1, that is, 
X denotes the maximum of the numbers, X(a,b) = max(a,b) and Y denotes the sum of the numbers, 
Y(a,b) =a+b. Using the distribution of X, which appears in Example 5.3, the expectation of X is computed 


as follows: 
1 3 5 7 9 11 161 
E(X) 155] + 2(55] -3(=) “4(=) 3(5) (5) <= 447 


Using the distribution of Y, which also appears in Example 5.3, the expectation of Y is computed as follows: 


#0) =A 35} +305) +4( 3) t+ 2(36) = 36 =? 
36) ~\36/  \36/° ss \ 36) 36 
EXAMPLE 5.7 Let X and Y be random variables with the following respective distributions: 
Xj | 2 3 6 10 yj | -8 -2 0 3 7 
Di | 0.2 0.2 0.5 0.1 Di | 0.2 0.3 0.1 0.3 0.1 


Then 


E(X) = xp; = 2(0.2) + 3(0.2) + 6(0.5) + 10(0.1) 
0.44+0.6+3.0+10=5 


E(Y) = >" yipi = —8(0.2) — 2(0.3) + 0(0.1) + 3(0.3) + 7(0.1) 
1.6 -0.6+0+ 0.9 +0.7 = -0.6 


Remark: The above Example 5.7 shows that the expectation of a random variable may be 
negative. It also shows that we can talk about the distribution and expectation of a random variable 
X without any reference to the original sample space S. 


EXAMPLE 5.8 Suppose a fair coin is tossed 6 times. One can show (Section 6.2) that the number x; of heads 
occurs with probability p; as follows: 


Then the expected number F of heads is as follows: 


eo) +8) (8) (2) al (8) of) 


This agrees with our intuition that, when a fair coin is repeatedly tossed, about half of the tosses should be 
heads. 


The following theorems (proved in Problems 5.44 and 5.45) relate the notion of expectation to 
operations on random variables defined in Section 5.2. 


Theorem 5.2: Let X be a random variable and let k be a real number. Then 
(i) E(KX) = kE(X) and (ii) F(X +k) =E(X) +k 
Thus, for any real numbers a and 5, 
E(aX + b) = E(aX) + b = aE(X) + b 
Theorem 5.3: Let X and Y be random variables on the same sample space S. Then 


E(X + Y) = E(X) + E(¥) 
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A simple induction argument yields the following: 
Corollary 5.4: Let X1, X5,..., X, be random variables on the same sample space S. Then 


E(X,+ X,+-+>+ X,) = E(X%) + E(X,) +--+ + E(X,) 


Expectation and Games of Chance 


Frequently, a game of chance consists of m outcomes a, a2, ..., a, Occurring with respective 
probabilities p;, p2,..., Pn. Suppose the payoff to a player is w; for the outcome a;, where a positive 
w, denotes a win for the player and a negative w; denotes a loss. Then the expected value E of the 
game for the player is the quantity 


E= wy py + Wopot+++ + Wada 


The assignment of w; to a; may be viewed as a random variable X, and the expectation E(X) of X is 
the expected value of the game. The game is fair if E = 0, favorable to the player if E is positive, and 
unfavorable to the player if E is negative. 


EXAMPLE 5.9 A fair die is tossed. If 2, 3, or 5 occurs, the player wins that number of dollars, but if 1, 4, or 
6 occurs, the player loses that number of dollars. The possible payoffs for the player and their respective 
probabilities follow: 


J -4  -6 
@.|2 © 2 ££ PF a 


The negative numbers —1, —4, —6 refer to the fact that the player loses when 1, 4, or 6 occurs. Then the 
expected value E of the game is as follows: 


ie Cae le Go 


Thus, the game is unfavorable to the player since the expected value F is negative. 


Mean and Expectation 


Suppose X is a random variable with n distinct values x1, x2, ...,, and suppose each x; occurs with 
the same probability p;. Then each p; = +. Accordingly 


1 1 1 Xp +X. +:++4+x, 
E(X) = x,(—) +x.{—])+---+ = 
weals) a(n) + t(5) 


n 


This is precisely the average or mean value of the numbers x1, X2, ..., X, (See Appendix A.) For 
this reason E(X) is called the mean of the random variable X. Furthermore, since the Greek letter 
p (read “‘mu’”’) is used for the mean value of a population, we also use pw for the expectation of 
X. That is, 


The mean p is an important parameter for a probability distribution, and in the next section, 
Section 5.4, we introduce another important parameter, denoted by the Greek letter o (read “‘sigma’”’), 
called the standard deviation of X. 
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5.5 VARIANCE AND STANDARD DEVIATION 


The mean of a random variable X measures, in a certain sense, the ‘‘average”’ value of X. The 
concepts in this section, variance and standard deviation, measure the “spread” or “dispersion” 
of X. 

Let X be a random variable with mean p = E(X) and the following probability distribution: 


Xx | Xy X2 x3 Fay Xn 


f(x) fi) fl) files) Fn) 
The variance of X, denoted by var(X), is defined by 
var(X) = (x1 — #)? f@1) + G2 — wy? f@2) +++ + On — BY AR) 
= >) Gi HP Flax) = BCX Hy) 


The standard deviation of X, denoted by ox or simply o, is the nonnegative square root of var(X), 
that is 


ox = Vvar(X) 


Accordingly, var(X) = o% Both var(X) and o% or simply o* are used to denote the variance of a 
random variable X. 


The next theorem gives us an alternate and sometimes more useful formula for calculating the 
variance of a random variable X. 


Theorem 5.5: var(X) = x7f(x1) + x3 fl) +++ +23 fin) — w? = >) x? fee) — w? = E(X?) — w? 
Proof: Using 5) x;f(x;) = wand 5° f(x; = 1, we obtain 
> Gi - wf) = Dd) OF - Zur, + nu) f(x) 
= >) x? fle) — 2nd) xifon) + wd) fle) 
= >) PAG) — Qn? + pe? =D) xP FQ) — w? 
This proves the theorem. 


Remark: Both the variance var(X) = o° and the standard deviation o measure the weighted 
spread of the values x; about the mean p; however, one advantage of the standard deviation o is that 
it has the same units as wy. 


EXAMPLE 5.10 Let X denote the number of times heads occurs when a fair coin is tossed 6 times. The 
distribution of X appears in Example 5.8 where its mean 4 = 3iscomputed. The variance of X is computed using 
its definition as follows: 


1 6 15 1 
0-37 + (1-3 — + (2-3 — +--+ + (6-3) 1.5 
vat) = 0-37 4 = 39 +e -3F a G=aye 


Alternately, by Theorem 5.5, 


1 6 1 
P—4+ PH 4+2— 4+ 37?— 4+ VP— 4+ 97— 4+ 6—=155 
wana ppd ey ogre Pen ee ee ea 


Thus, the standard deviation is 0 = V1.5 ~ 1.225 (heads). 
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EXAMPLE 5.11 A pair of fair dice are tossed. Let X and Y be the random variables in Example 5.1, that is, 
X denotes the maximum of the numbers, X(a,b) = max(a,b), and Y denotes the sum of the numbers, 
Y(a,b)=a+b. The distributions of X and Y appear in Example 5.3 and their expectations were computed in 
Example 5.6 yielding: 


wx = E(X)=447 and py =E(Y)=7 


Find the variance and standard deviation of: (a) X, (b) Y. 


(a) First we compute E(X”) as follows: 


ECC = > he) = (5, 2 (=) #(=) # (7) #(=) ° (5) -= = 21.97 


Hence 


var(X) = E(X2) — wz = 21.97 — 20.25=1.99 and oy = V1.99 =14 


(b) First we compute E(Y*) as follows: 


E(¥?) = S\ y? fOr) =2(5] (=) #(=) Peed 12°(55] =O = 548 


Hence 


var(Y) = E(Y*) — py = 54.8-49=5.8 and oy = V5.8 =2.4 
We establish (Problem 5.46) an important property of the variance and standard deviation. 


Theorem 5.6: Let X be a random variable and let a and b be constants. Then 
var(aX + b) =a’ var(X) and oyx+s5 = |alox 


There are two special cases of Theorem 5.6 which occur frequently; the first where a = 1 and the 
second where b= 0. Specifically, for any constant k 


(i) var(X + k) = var(X) and hence oy+, = ox. 
(ii) var(KX) = k? var(X) and hence oxx = |k| oy. 


Remark: There are physical interpretations of the mean and variance. Suppose the x axis is a 
thin wire and at each point x; there is a unit with mass p;._ Then, if a fulcrum or pivot is placed at the 
point pw [Fig. 5-3(a)], the system will be balanced. Hence yp is called the center of mass of the 
system. On the other hand, if the system were rotating about the center of mass yu [Fig. 5-3(b)], then 
the variance o* measures the system’s resistance to stopping, called the moment of inertia of the 
system. 


(a) 


(b) 


Fig. 5-3 
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Standardized Random Variable 


Let X be a random variable with mean pw and standard deviation o>0. The standardized 

random variable Z is defined by 
xX — 
ya Xa 

Oo 
Important properties of Z are contained in the next theorem (proved in Problem 5.48). 
Theorem 5.7: The standardized random variable Z has mean pz=0 and standard deviation 

Oz — 1. 


EXAMPLE 5.12 Suppose a random variable X has the following distribution: 
x | 2 4 6 8 


f(x) | 04 02 03 04 


(a) Compute the mean p and standard deviation o of X. 
(b) Find the probability distribution of the standardized random variable. Z = (X — w)/o, and show that pz = 0 
and oz = 1, as predicted by Theorem 5.6. 


(a) We have: 


w= E(X) = > xiflx;) = 201) + 40.2) + 6(0.3) + 8(0.4) = 6 


E(X?) = S) x? fai) = 27(0.1) + 4(0.2) + 67(0.3) + 8°(0.4) = 40 


Now using Theorem 5.5, we obtain 
o = var(X) = E(X*) - w =40-6?=4 and o=2 
(b) Using z = (x — p)/o = (x — 6)/2 and f(z) = f(x), we obtain the following distribution for Z: 


ell ee -1 O 1 


f(z) | 0.1 0.2 0.3 0.4 


Then 


pz = E(Z) = >) zif(z;) = —2(0.1) — 1(0.2) + 0(0.3) + 1(0.4) = 0 


E(Z?) = > z? f(z) = (-2)°(0.1) + (-17(0.2) + (0.3) + 170.4) = 1 


Using Theorem 5.5, we obtain 


oy = var(Z) = E(Z?)- ww =1-0?=1 and o7,=1 


(The results 4, = 0 and oz = 1 were predicted by Theorem 5.7.) 


5.6 JOINT DISTRIBUTION OF RANDOM VARIABLES 
Let X and Y be random variables on the same sample space S with respective range spaces 
Ry = {X1, X%,---,X,} and Ry = {y1, yo, ---5 Vm} 


The joint distribution or joint probability function of X and Y is the function h on the product space 
Ry X Ry defined by 


A(x;, yj) = P(X = x, Y = y)) = Pls © S: X(s) = xi, Y(s) = yj) 
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The function / is usually given in the form of a table as in Fig. 5-4. The function h has the 
properties 


(i) A@ay)=0 Gi) DY Dd) A@ay) =1 


Thus, / defines a probability space on the product space Ry X Ry. 


Ary, yy) A(x, y2) cia Ax, yj) ae h(X1, Yn) 
N(x2, yy) h(x, y2) = A(x2, y;) aaiel A(X, Ym) 


An, yi) > wae AX, y;) a A(X, Ym) 


§' Om ) 


Fig. 5-4 


The functions f and g on the right side and the bottom side, respectively, of the joint distribution 
table in Fig. 5-4 are defined by 


fee) =D, htny) and g(y) = DY h(a y,) 


That is, f(x;) is the sum of the entries in the ith row and g(y,) is the sum of the entries in the jth 
column. They are called the marginal distributions, and are, in fact, the (individual) distributions of 
X and Y, respectively (Problem 5.42). 


Covariance and Correlation 


Let X and Y be random variables with the joint distribution h(x, y), and respective means py and 
py. The covariance of X and Y, denoted by cov(X, Y), is defined by 


cov(X, Y) = > (Xi — Bx)Q; — By) AQ y)) = E[(X — wx)(Y — py)] 
ij 
Equivalently (Problem 5.47), 
cov(X, Y) = > xi yjh(xi, Yj) — ex by = E(XY) — pxpy 


ij 


The correlation of X and Y, denoted by p(X, Y), is defined by 


cov(X, Y) 
p(X, ¥) = SV 
Mx by 
The correlation p is dimensionless and has the following properties: 
(i) p(X, Y) = o(¥,X), (iii) p(X, X) = 1, p(X, —X) = -1, 
(ii) -1l=<p<1, (iv) p(aX+b,cY +d)=p(X,Y) if a,c #0. 
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We note (Example 5.14) that a pair of random variables with identical individual distributions can 
have distinct covariances and correlations. Thus, cov(X, Y) and p(X, Y) are measurements of the way 
that X and Y are interrelated. 


EXAMPLE 5.13 Let S be the sample space when a pair of fair dice is tossed, and let X and Y be the random 
variables on S in Example 5.1. That is, to each point (a, b) in S, X assigns the maximum of the numbers and Y 
assigns the sum of the numbers: 


X(a,b) = max(a,b) and Y(a,b)=a+b 


The joint distribution of X and Y appears in Fig. 5-5. The entry A(3,5) = % comes from the fact that (3,2) and 
(2, 3) are the only points in S whose maximum number is 3 and whose sum is 5, that is, 


h(3,5) = P(X = 3, Y = 5) = P{(3, 2), (2,3) =% 


The other entries are obtained in a similar manner. 


ooo Oo SO BH 
oe o-o co F8N OS 
OOO BB O 
OO Brn OO 
S Bh slea- OO 


Bly Be Bw CO OO 
giv gp Oo OO © 
SIE Sle Sls Sle Ble Sle 


Observe that the right side sum column does give the distribution f of X and the bottom sum row does give 
the distribution g of Y which appear in Example 5.3. 
We compute the covariance and correlation of X and Y. First we compute E(XY) as follows: 


E(XY) = S XiyN@& Y)) 


= 10)(] 23)(=) 2(4)(55) Peed 6(12)( 55] = = 34.2 


By Example 5.6, wx = 4.47 and py = 7, and by Example 5.11, oy = 1.4 and oy = 2.4; hence 


cov(X, Y) = E(XY) — pxpy = 34.2 — (4.47)(7) = 2.9 
and 


cov(x,Y) 2.9 


vxity dQ) 


p(X, Y) = 
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EXAMPLE 5.14 Let X and Y be random variables with joint distribution as shown in Fig. 5-6(a), and let X' and 
Y' be random variables with joint distribution as shown in Fig. 5-6(b). The marginal entries in Fig. 5-6 tell us that 
X and X’ have the same distribution, and that Y and Y’ have the same distribution, as follows: 


x | 1 3 y | 4 10 
fey | 44 wo | 4 3 
Distribution of X and X' Distribution of Y and Y’ 


Thus 


Oa ae 3) +10(3)-7 
, + an ' 
Mx ~ bx 5) 2 My = My >) 5) 


We show that cov(X, Y) #cov(X', Y’) and hence p(X, Y) # p(X’, Y’). First we compute E(XY) and 
E(X'Y’') as follows: 


E(XY) =104(3) 100)(5) 3(4)(5) 3(10)(5) =14 


1 1 
E(X'Y') = 1(4)(0) 4 100)(5) 3(4)(5) 3(10)(0) = 11 
Since wy = by = 2 and py = py = 7, we obtain 


cov(X, Y) = E(XY) — pxby = 0 and cov(X’', Y') = E(XY) — pxpy 3 


Fig. 5-6 


Remark: The notion of a joint distribution h is extended to any finite number of random variables 
X, Y,..., Z in the obvious way, that is, / is a function on the product set Ry X Ry X --- X Rz defined by 


N(Xi, Vjs 05 Ze) = P(X =X, Y= y, 2, Z = Ze) 


As in the case of two random variables, h defines a probability space on the product set Ry X Ry X --: X Rz. 


5.7 INDEPENDENT RANDOM VARIABLES 


Let X, Y,..., Z be random variables on the same sample space S. Then X, Y,..., Z are said to 
be independent if, for any values x;, y;, ..., Zx, we have 


P(X =x, Y=y,.... Z = Z%) = PX =x) P(Y = y,;) +++ P(Z =) 
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In particular, X and Y are independent if 
P(X = xi, Y = yj) = P(X = x) P(Y = y;) 


Now suppose X and Y have respective distribution f and g and joint distribution h. Then the above 
equation may be written as 


h(x, yj) = fla)gQ) 
Thus, random variables X and Y are independent if each entry h(x;, y,) in the joint distribution table 
is the product of its marginal entries f(x,) and g(y,). 


EXAMPLE 5.15 Let X and Y be random variables with the joint distribution in Fig. 5-7. Then X and Y are 
independent random variables since each entry in the joint distribution can be obtained by multiplying its marginal 
entries. For example, 


P(1,2) = P(X = 1)P(Y = 2) = (0.30)(0.20) = 0.06 
P(1,3) = P(X = 1)P(Y = 3) = (0.30)(0.50) = 0.15 
P(1, 4) = P(X = 1)P(Y = 4) = (0.30)(0.30) = 0.09 


And so on. 


0.06 0.15 


0.14 0.35 


0.20 0.50 


Fig. 5-7 


EXAMPLE 5.16 A fair coin is tossed twice giving the equiprobable space S = {HH, HT, TH, TT}. 
(a) Let X and Y be random variables on S defined as follows: 


(i) X =1 if the first toss is H and X = 0 otherwise. 
(ii) Y= 1 if both tosses are H and Y = 0 otherwise. 


The joint distribution of X and Y appears in Fig. 5-8(a). Note that X and Y are not independent random 
variables. For example, P(0,0) is not equal to the product of the marginal entries. Namely, 


PO, 0) = 3 # (@)(G) = P(X = 0)P(Y = 0) 


(b) Now let X and Y be random variables on S defined as follows: 


(i) X =1 if the first toss is H and X = 0 otherwise. 
(ii) Y= 1 if the second toss is H and Y = 0 otherwise. 


The joint distribution of X and Y appears in Fig. 5-8(b). Note that X and Y are now independent. 
Specifically, each of the four entries is , and each entry is the product of its marginal entries: 


PG, j) =4= @@ = P(X = )PY = j) 
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1 
2 
1 
2 


AR pre 


(b) 


Fig. 5-8 


The following theorems (proved in Problems 5.49 and 5.50) give important properties of 
independent random variables which do not hold in general. 


Theorem 5.8: Let X and Y be independent random variables. Then: 
(i) E(XY) = E(X)E(Y), 
(ii) var(X + Y) = var(X) + var(Y), 
(iii) cov(X, Y) = 0. 


Part (ii) in the above theorem generalizes as follows: 


Theorem 5.9: Let Xi, X2,..., X, be independent random variables. Then 
var(X, + ++: + X,,) = var(X,) +--+ + var(X,,) 


5.8. FUNCTIONS OF A RANDOM VARIABLE 


Let X and Y be random variables on the same sample space S._ Then Y is said to be a function 
of X if Y can be represented Y = ®(X) for some real-valued function ® of a real variable, that is, if 
Y(s) = ®[X(s)] for every s€ S. For example, kX, X*, X + k, and (X + k)’ are all functions of X with 
P(x) = kx, x°, x + k, and (x +k)’, respectively. We have the following fundamental result (proved 
in Problem 5.43): 


Theorem 5.10: Let X and Y be random variables on the same sample space S with Y = ®(X). 
Then 


E(Y) = D, ®(x) fx) 
i=1 
where f is the distribution function of X. 


Similarly, a random variable Z is said to be a function of X and Y if Z can be represented 
Z = ®(X, Y) where © is a real-valued function of two real variables, that is, if 


Z(s) = ®[X(s), ¥(s)] 


for every s€ S. For example, X + Y is a function of X and Y with B(x, y) =x + y. 
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Corresponding to the above theorem, we have the following analogous result: 


Theorem 5.11; Let X, Y, Z be random variables on the same sample space S with Z = ®(X, Y). 
Then 


E(Z) = ¥/ P(x yh y)) 


where A is the joint distribution of X and Y. 


We note that the above two theorems have been used implicitly in the preceding discussion and 
theorems. The proof of Theorem 5.11 will be given as a supplementary problem; it generalizes to a 
function of m random variables in the obvious way. 


EXAMPLE 5.17 Let X and Y be the dependent (nonindependent) random variables in Example 5.16(a), and let 
Z=X+Y. We show that 


E(Z) = E(X) + E(Y) but var(Z) # var(X) + var(Y) 


[Thus, Theorem 5.8 need not hold for dependent (nonindependent) random variables.] 
The joint distribution of X and Y appears in Fig. 5-8(a). The right marginal distribution is the distribution 
of X; hence 


w= £00) =0(2) +1(2)=4 ana 2079 =0(2) + (2) =} 
Var(X) = EO?) = wk = 5-753 


The bottom marginal distribution is the distribution of Y; hence 


m= )oi()=2 at 209 -0() xf) 
var(¥) = EY) ~ wb =F - = 5 


The random variable Z = X¥ + Y assumes the values 0, 1, 2 with respective probabilities 4, 4,4. Thus 


pe) A eee ee =O eee 


11 
Z) = E(Z? ; 
TE ie a6 aie 
Therefore 
Pd! 3 1 3 7 11 
E(X) + E(Y) 5 ae E(Z) but — var(X) + var(Y) 4 16 16 16 var(Z) 


EXAMPLE 5.18 Let X and Y be the independent random variables in Example 5.16(b), and let Z = X + Y. We 
show that 


E(Z) = E(X) + E(Y) and var(Z) = var(X) + var(Y) 
The equation for E(Z) is always true, and the equation for var(Z) is expected since X and Y are independent. 


The joint distribution of X and Y appears in Fig. 5-8(b). The right and bottom marginal distributions are the 
distributions of X and Y, respectively, and they are identical. Thus 


box = By o(5) +1(5) : and E(X?) = E(Y?) (5) +7(5) : 


1 


var(X) = var(¥) = E(X?) ~ 2 
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The random variable Z = X + Y assumes the values 0, 1, 2 but now with respective probabilities 1/4, 1/2, 
1/4. Thus: 


vex cao) i(2)2[8)=1 ans eer-o(!) +18) 2) -3 


3 1 
Z E(Z? a 1 
var(Z) = E(Z?) ~ nz =5-1=5 
Therefore 
; 1 1 
E(X) + E(Y) 1= E(Z) 
2 2 
and 


var(X) + var(Y) 


le 


5.9 DISCRETE RANDOM VARIABLES IN GENERAL 


Now suppose X is a random variable on a sample space S with a countable infinite range space, 
say Rs = {x1, X2,...}. As in the finite case, X induces a function fon Ry, called the distribution of X, 
defined by 


f(x) = P(X = xi) 
The distribution is frequently presented by a table as follows: 


x | X1 X2 X3 


f(x) | fox) fl) fee) 


The distribution f has the following two properties: 


@ fa)=0, di) > fe) =1 


i=1 
Thus, Ry with the above assignment of probabilities is a probability space. 
The expectation E(X) and variance var(X) of the above random variable X are defined by the 
following series when the relevant series converge absolutely: 


00 


E(X) = x1 f(%1) + X2f(%2) + x3 f(%s) +++ = > xi f(x) 


var(X) = (1 — wy’ fs) + (2 — Bw)’ fQ2) +0 = 2 (x: — BY’ fx) 


It can be shown that var(X) exists if and only if ~ = E(X) and E(X’) both exist and in this case the 
following formula holds just as in the finite case: 


var(X) = E(X?) ~ p? 
When var(X) does exist, the standard deviation ox is defined just as in the finite case: 
ox = Vvar(X) 


The notions of joint distribution, independent random variables, and functions of random 
variables are the same as in the finite case. Moreover, suppose X and Y are defined on the same 
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sample space S$ and var(X) and var(Y) both exist. Then the covariance of X and Y, written cov(X, Y), 
is defined by the following series which can also be shown to converge absolutely: 


cov(X, Y) = > (x; — Bx); — By) A: y)) 
ij 
In addition, the relation 


cov(X, Y) = >» xiyA(Xi, y;) — bx by = E(XY) — px py 
ij 


holds just as in the finite case. 


Remark: To avoid technical difficulties, we will establish many of the theorems in this chapter 
only for finite random variables. 


5.10 CONTINUOUS RANDOM VARIABLES 


Suppose that X is a random variable on a sample space S whose range space Ry is a continuum 
of numbers such as an interval. Recall from the definition of a random variable that the set 
{a = X =D} is an event in S and therefore the probability P(a = X <b) is well defined. We assume 
there is a piecewise continuous function f: R— R such that P(a = X <5) is equal to the area under 
the graph of f between x = a and x = 5, as shown in Fig. 5-9. In the language of calculus 


P(asX sb) =| f(x) dx 


In this case X is said to be a continuous random variable. The function fis called the distribution or 
the continuous probability function (or density function) of X; it satisfies the conditions: 


(i) f(x)=0, and (ii) | pee ah 


That is, fis nonnegative and the total area under its graph is 1. 


The expectation E(X) for a continuous random variable X is defined by the following integral when 
it exists: 


E(X) = if xf(x) dx 


Functions of random variables are defined just as in the discrete case. Furthermore, if Y = ®(X) then 
it can be shown that 


E(Y) = i © (x) fle) dx 


—0 


when it exists. 


a b 


P(a = X = b) = area of shaded region. 


Fig. 5-9 
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The variance var(X) is defined by the following integral when it exists: 


var(X) = E((X — 4?) = | * (c= wh? fle) dx 


Just as in the discrete case, it can be shown that var(X) exists if and only if 4 = E(X) and E(X’) both 
exist and then 


var(X) = E(X?) — 42 = i “x2 f0) de — 2 


When var(X) does exist, the standard deviation ox is defined as in the discrete case by 


ox = Vvar(X) 


EXAMPLE 5.19 Let X be a random variable with the following distribution function f- 


5x if0<x<2 


ft) = {F 


elsewhere 


The graph of f appears in Fig. 5-10. Then 


S] 
P(. =X S15) = area of shaded region in diagram = 16 


Fig. 5-10 


Using calculus, we are able to compute the expectation, variance, and standard deviation of X as follows: 


E(X) = [ rou = [ sede - =] d 


eur)=[epeyar= [8 de=[E] =2 


—2 0 
16 2 2 1 
2 2 = a 
var(X) = E(X*) — p? =2 ra and oy 5 3 V2 
Independent Continuous Random Variables 
A finite number of continuous random variables X, Y,..., Z are said to be independent if, for any 


intervals [a,a’], [b, b’], ..., [c, c’], we have 


PiasxXsa',bSYSbD',...,cSZ8c')=Pasxsa')PbsSYSb')::-P(cSZSc’') 


Observe that intervals play the same role in the continuous case as points did in the discrete case. 
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5.11 CUMULATIVE DISTRIBUTION FUNCTION 


Let X be a random variable (discrete or continuous). The cumulative distribution function F of 
X is the function F: R— R defined by 


F(a) = P(X =a) 


Suppose X is a discrete random variable with distribution f| Then F is the “step function” 
defined by 


F(x) = >) f(x) 


Xj=xX 


On the other hand, suppose X is a continuous random variable with distribution f. Then 


Fo)= | fod 
In either case, F has the following two properties: 
(i) Fis monotonically increasing, that is, 
F(a) = F(b) whenever ab 
(ii) The limit of F to the left is 0 and to the right is 1: 
lim F(x) =0 and lim F(x) =1 


xX —00 x 


EXAMPLE 5.20 Let X be a discrete random variable with the following distribution function f: 
x | ~2 1 2 4 


f(x) | 1a 18 12 1/8 


The graph of the cumulative distribution function F of X appears in Fig. 5-11. Observe that Fis a “step function” 
with a step at x; with height f(x;). 
EXAMPLE 5.21 Let X be a continuous random variable with the following distribution function f: 


5x if0Sx=<2 


fs) = [F 


elsewhere 
The cumulative distribution function F of X follows: 
0 ifx<0 
F(x) = 49° if0<x=2 
1 ifx>2 
i+ —$—_> 
1 
2 : 
~< t T I a 
-3 -2 -1 0 1 2 3 4 5 


Fig. 5-11. Graph of F. 
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=I 0 1 2 


(a) Graph of f (b) Graph of F 


Fig. 5-12 


Here we use the fact that, for0O=x =2, 


e7] 1 
F(x) = i qiat = ra 
0 


The graph of f appears in Fig. 5-12(a) and the graph of F appears in Fig. 5-12(b). 


5.12 CHEBYSHEV’S INEQUALITY AND THE LAW OF LARGE NUMBERS 


The standard deviation o of a random variable X measures the spread of the values of X about 
the mean pw of X. Accordingly, for smaller values of o, we would expect that X will be closer to its 
mean p. This intuitive expectation is made more precise by the following inequality, named after the 
Russian mathematician P. L. Chebyshev (1921-1994): 


Theorem 5.12 (Chebyshev’s Inequality): Let X be a random variable with mean p and standard 
deviation o. Then, for any positive number k, the probability that a value of X lies 
in the interval [wu — ko, w+ ko] is at least 1 — 1/k?._ That is, 

1 
Pw ko =X sp +ko)=1— 5 
A proof of this important theorem is given in Problem 5.51. We illustrate the use of this theorem 
in the next example. 


EXAMPLE 5.22 Suppose X is a random variable with mean pw = 100 and standard deviation o = 5S. 


(a) Find the conclusion that one can derive from Chebyshev’s inequality for k = 2 and k = 3. 
Setting k = 2, we get 
1 1 3 


u — ko = 100 — 2(5) = 90, u t+ ko = 100 + 2(5) = 110, a a a ae 


Thus, from Chebyshev’s inequality, we can conclude that the probability that X lies between 90 and 110 is at least 
3/4, that is 


3 
P(90s X = 110) = ri 
Similarly, setting k = 3, we get 


u — ko = 100 — 3(5) = 85, ut ko = 100 + 3(5) = 115, ioe 


Thus P(85 =X =115)= 


\o | 00 
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(b) Estimate the probability that X lies between 100 — 20 = 80 and 100 + 20 = 120. 
Here ko = 20. Since o = 5, we get 5k = 20 andsok = 4. Thus, by Chebyshev’s inequality, 


P(80s X = 120) =1 1 : 2 204 
ke 4 16 
(c) Find an interval [a, b] about the mean pw = 100 for which the probability that X lies in the interval is at least 
99 percent. 
Here we set 1 — 1/k? = 0.99 and solve for k. This yields 
1-09 =4 or 001-4 or = = 100 or k=10 
ke kb 0.1 


Thus, the desired interval is 


[a, b] = [u — ko, w + ko] = [100 — 10(5), 100 + 10(5)] = [50, 150] 


Sample Mean and the Law of Large Numbers 


The intuitive idea of probability is the so-called law of averages, that is, if an event A occurs with 
probability p, then the “average number of occurrences of A” approaches p as the number n of 
(independent) trials increases. This concept is made precise by the law of large numbers which is 
stated below. First, however, we need to define the notion of the sample mean. 

Let X be the random variable corresponding to some experiment. The notion of n independent 
trials of the experiment was defined above. We may view the numerical value of each particular trial 
to be a random variable with the same mean as X. Specifically, we let X, denote the outcome of the 
kth trial where k = 1,2,...,n. The average value of all m outcomes is also a random variable, denoted 
by X,, and called the sample mean. That is, 

x ay dg be EX, 


n 


n 


The law of large numbers says that as n increases, the probability that the value of sample mean_X,, 
is close to w approaches 1. 


EXAMPLE 5.23 Suppose a fair die is tossed 8 times with the following outcomes: 


xX, 2. X2 5; X3 4, X4 1, Xs 4, X6 6, X7 3, Xg 2 


Then the corresponding value of the sample mean_X3 follows: 
— 24+5+4+44+14+4+64+34+2 27 
XZ = = 
8 8 


= 3.375 


For a fair die, the mean p = 3.5. The law of large numbers tells us that as 1 gets larger, the probability that the 
sample mean X,, will get close to 3.5 becomes larger and, in fact, approaches one. 


A technical statement of the law of large numbers follows. 


Theorem 5.13 (Law of Large Numbers): For any positive number a, no matter how small, 
Paw-asX,Sptayrl1 as now 


That is, the probability that the sample mean X,, has a value in the interval 
[uw — a, w+ a] approaches 1 as n approaches infinity. 


The following remarks are in order. 
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Remark 1: We prove Chebyshev’s inequality only for the discrete case. The continuous case 
follows from an analogous proof which uses integrals instead of summations. 


Remark 2: We prove the law of large numbers only in the case that the variance var(X;) of the 
X; exists, that is, does not diverge. We note that the theorem is true whenever the expectation E(X;) 
exists. 


Remark 3: The above law of large numbers is proved in Problem 5.52 using Chebyshev’s 
inequality. A stronger version of the theorem, called the strong law of large numbers, is given in more 
advanced treatments of probability theory. 


Solved Problems 


RANDOM VARIABLES AND EXPECTED VALUE 


5.1. Suppose a random variable X takes on the values —3, —1, 2, and 5 with respective 
probabilities 
2k —3 k-2 k-1 k+1 
10” 10 ’ 10 ’ 10 
(a) Determine the distribution of X. (b) Find the expected value E(X) of X. 


(a) Set the sum of the probabilities equal to 1, and solve for k obtaining k = 3. Then put k = 3 into the 
above probabilities yielding 0.3, 0.4, 0.2, 0.1. Thus, the distribution of X is as follows: 


x | 3 -1 2 5 


P(x =x) | 03 O1 02 04 


(b) The expected value E(X) is obtained by multiplying each value of X by its probability and taking the 
sum. Thus 


E(X) = (—3)(0.3) + (—1)(0.1) + 2(0.2) + 5(0.4) = 1.4 


5.2. A fair coin is tossed 4 times. Let X denote the number of heads occurring. Find: 
(a) distribution f of X, (b) E(X), (c) probability graph and histogram of X. 


The sample space S is an equiprobable space consisting of 2*= 16 sequences made up of H’s 
and T’s. 


(a) Since X is the number of heads, and each sequence consists of four elements, X takes on the values 
of 0, 1, 2, 3, 4, that is, Ry = {0, 1, 2, 3, 4}. 
(i) One point TTTT has 0 heads; hence f(1) = 1/16. 
(ii) Four points, HTTT, THTT, TTHT, TTTH, have 1 head; hence f(1) = 4/16. 
(iii) Six points, HHTT, HTHT, HTTH, THHT, THTH, TTHH, have 2 heads; hence f(2) = 6/16. 
(iv) Four points, HHHT, HHTH, HTHH, THHH, have 1 head; hence f(1) = 4/16. 
(v) One point, HHHH, has 4 heads; hence f(4) = 1/16. 


The distribution f of X follows: 
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(b) The expected value E(X) is obtained by multiplying each value of X by its probability and taking the 


sum. Hence 
E(X) 0( =] nee 2( = (=) (=) 2 


This agrees with our intuition that, when a fair coin is repeatedly tossed, about half of the tosses 
should be heads. 


(c) The probability bar chart of X appears in Fig. 5-13(a), and the probability histogram appears in 
Fig. 5-13(b). One may view the histogram as making the random variable continuous where 
X = 1 means X lies between 0.5 and 1.5. 


0 1 2 3 4 0 1 2 3 4 


(a) Bar chart (b) Histogram 


Fig. 5-13 


5.3. A fair coin is tossed until a head or five tails occurs. Find the expected number EF of tosses of 
the coin. 


The sample space S consists of the six points 
H, TH, TTH, TTTH, TTTTH, TTTTT 
with respective probabilities (independent trials) 
1 (3) =3 (3) =; (5) -z (3) =4 (5) -z 
2’ \2 4’? \2 8° \2 16’ \2 16’ \2 16 


The random variable X of interest is the number of tosses in each outcome. Thus 


X(H) =1 X(TTH) = 3 X(TTTTH) = 5 
X(TH) =2 X(TTTH) = 4 X(TTTTT) = 5 


These X values are assigned the following probabilities: 
POSPG) S50. ROHR = 7s. “POS POT) = 


1 1 1 1 
P(4) = P(TTTH) =— P(5) = P({TTTTH, TTTTT}) == + — = — 
(4) =P(TITH) ==, P(S) = P(TTTTH, TTTTT) == + = == 


Accordingly, E = E(X) (5) +2(3) 3(5) (=) (=) =19 
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5.4. 


5.5. 


5.6. 
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A random sample with replacement of size n = 2 is chosen from the set {1, 2,3} yielding the 
9-element equiprobable space 

S={(1, 1), 0,2), 0,3), (2, D, (2,2), (2,3), B,D, B,2), GB, 3)} 
Let X denote the sum of the two numbers. (a) Find the distribution f of X. (b) Find the 
expected value E(X). 


(a) The random variable X assumes the values 2, 3, 4, 5, 6, that is, Ry = {2,3,4,5,6}. We compute the 
distribution f of X: 
(i) One point (1,1) has sum 2; hence f(2) = 1/9. 
(ii) Two points, (1,2), (2,1), have sum 3; hence f(3) = 2/9. 
(iii) Three points, (1, 3), (2,2), (1,3), have sum 4; hence f(4) = 3/9. 
(iv) Two points, (2,3), (3,2), have sum 5; hence f(5) = 2/9. 
(v) One point (3,3) has sum 6; hence f(6) = 1/9. 


Thus, the distribution f of X is as follows: 


ei. 2 = 2} 


f(x) | 19 29 3/9 29 41/9 


(b) The expected value E(X) is obtained by multiplying each value of x by its probability and taking the 


© nasa) eo) os) 


Let Y denote the minimum of the two numbers in each element of the probability space S in 
Problem 5.4. (a) Find the distribution g of Y. (b) Find the expected value E(Y). 


(a) The random variable Y only assumes the values 1, 2, 3, that is, Ry = {1,2,3}. We compute the 
distribution g of Y: 
(i) Five points, (1, 1), (1,2), (1,3), (2, 1), (3,1), have minimum 1; hence g(1) = 5/9. 
(ii) Three points, (2,2), (2, 3), (3,2), have minimum 2; hence g(2) = 3/9. 
(iii) One point (3,3) has minimum 3; hence g(3) = 1. 


Thus the distribution g of Y is as follows: 


ee 2 8 


g(y) | 5/9 3/9 1/9 


(b) Multiply each value of y by its probability and take the sum obtaining: 


conf) (25) Bo 


Five cards are numbered 1 to 5. Two cards are drawn at random (without replacement) to 
yield the following equiprobable space S with C(5, 2) = 10 elements: 


S = [{1,2}, {13}, (1,4), {1,5}, (2,3), {2,4}, {2,5}, 3,4}, 13,5}, {4 5H 
Let X denote the sum of the numbers drawn. (a) Find the distribution f of X. (b) Find 


E(X). 
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(a) The random variable X assumes the values 3, 4, 5, 6, 7, 8, 9, that is, Ry = {3,4,5, 6,7, 8,9}. The 
distribution f of X is obtained as follows: 


(i) One point, {1,2}, has sum 3; hence f(3) = 0.1. 
(ii) One point, {1,3}, has sum 4; hence f(4) = 0.1. 
(iii) Two points, {1, 4}, {2,3}, have sum 5; hence f(5) = 0.2. 


And so on. This yields the following distribution f of X: 


a 


f(x) | 0.1 0.1 0.2 0.2 0.2 0.1 0.1 


(b) The expected value E(X) is obtained by multiplying each value x of X by its probability f(x) and 
taking the sum. Thus 


E(X) = 3(0.1) + 4(0.1) + 5(0.2) + 6(0.2) + 7(0.2) + 8(0.1) + 9(0.1) = 6 


5.7. Let Y denote the minimum of the two numbers in each element of the probability space S in 
Problem 5.6. (a) Find the distribution g of Y. (b) Find E(Y). 


(a) The random variable Y only assumes the values 1, 2, 3, 4, that is, Ry = {1,2,3,4}. The distribution 
g of Y is obtained as follows: 


(i) Four points, {1,2}, {1,3}, {1,4}, {1,5}, have minimum 1; hence g(1) = 0.4. 
(ii) Three points, {2,3}, {2, 4}, {2,5}, have minimum 2; hence g(2) = 0.3. 
(iii) Two points, {3, 4}, {3,5}, have minimum 31; hence g(3) = 0.2. 
(iv) One point, {4,5}, has minimum 4; hence g(4) = 0.1. 


Thus, the distribution g of Y is as follows: 


y [rt 2 3 4 


e(y) | 04 03 02 O41 


(b) Multiply each value of y by its probability g(y) and take the sum, obtaining 
E(Y) = 1(0.4) + 2(0.3) + 3(0.2) + 4(0.1) = 2.0 


5.8. A player tosses two fair coins yielding the equiprobable space 
S = {HH, HT, TH, TT} 


The player wins $2 if 2 heads occur and $1 if 1 head occurs. On the other hand, the player loses 
$3 is no heads occur. Find the expected value E of the game. Is the game fair? (The game 
is fair, favorable, or unfavorable to the player accordingly as E = 0, E> 0, or E<0.) 


Let X denote the player’s gain to yield 
X(HH) = $2, X(HT) = X(TH) = $1, X(TT) = —$3 


Thus, the distribution of X is as follows: 
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The expectation of X follows: 


enuf) (2) 8) =f 


4 
Since E(X) > 0, the game is favorable to the player. 


5.9. A player tosses two fair coins. The player wins $3 if 2 heads occur and $1 if 1 head 
occurs. For the game to be fair, how much should the player lose if no heads occur? 


Let Y denote the player’s gain; then the distribution of Y is as follows where k denotes the unknown 
payoff to the player: 


y | 3 1 k 


p(y =y) | 4 2/44 


1h p= 5) =3(2) +1(2) +#(2) 224 


For a fair game, E(Y) should be zero. This yields k = —5. Thus, the player should lose $5 if no heads 
occur. 


5.10. A box contains 8 lightbulbs of which 3 are defective. A bulb is selected from the box and 
tested. If it is defective, another bulb is selected and tested, until a nondefective bulb is 
chosen. Find the expected number E of bulbs chosen. 


Writing D for defective and N for nondefective, the sample space S has the four elements 
N, DN, DDN, DDDN 
with respective probabilities 


5 3.5 15 3°25 5 3.2195. 1 
8° 8 7 56’ 8 7 6 56’ 8 765 56 


The number X of bulbs chosen has the values 
X(N) = 1, X(DN) = 2, X(DDN) = 3, X(DDDN) = 4 


with the above respective probabilities. Hence 
5 15 5 1 3 
E(X) i(3) (= (= (=) 5 1.5 


5.11. A coin is weighted so that P(H) =} and P(T) =}. The coin is tossed 3 times yielding the 
following 8-element probability space: 


S = {HHAH, HHT, ATH, ATT, THH, THT, TTH, TTT} 
Let X denote the number of heads that appears. (a) Find the distribution f of X. (b) Find 


E(X). 
(a) The points in S have the following respective probabilities: 
3.3.3) 27 133 9 
P(HHH) = ~-~-= == P(THH) =—-~-~ Se 
444 64 444 64 
331 9 3 1 
P(HHT) =~-~-— = P(THT) =—-~—--— Se 
ee) 444 64’ ve) 4 4 64 
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3.13 3 
P(HTH) = ~+— += a P(TTH) = —-—+= == 
(HTH) = 7 (TTH) a 

311. 3 it. 4". 4 
P(HTT) = ~-—-— = P(TTT) =—-—-— a 
a a OTST a eh 


Since X denotes the number of heads, 
X(HHH) = 3 X(HHT) = X(HTH) = X(THH) = 2 
X(HHT) = X(THT) = X(TTH) = 1 X(TTT) =0 


Thus, Ry = {0,1, 2, 3} is the range space of X. Also, 


— 


f(0) = P(HHH) = = f(2) = P(HHT) + P(HTH) + P(THH) = = 
fC) = P(HTT) + P(THT) + P(TTH) = = f(3) = P(HHH) = 7 
The distribution of X follows: 
x | 0 1 2 3 
alae & & & 


(b) Multiply each value of x by its probability f(x) and take the sum to obtain 
1 9 27 27 144 
E 0 + 1 +2 + 3 2.25 
(X) (a) (a) (Z) (Z) 64 


5.12. Concentric circles of radius 1 and 3 in are drawn on a circular target of radius 5 in as pictured 
in Fig. 5-14. A person fires at the target and, as indicated by Fig. 5-14, receives 10, 5, or 3 points 
according to whether the target is hit inside the smaller circle, inside the middle annular region, 
or inside the outer annular region, respectively. Suppose the person hits the target with 
probability 5, and then is just as likely to hit one point of the target as the other. Find the 
expected number E of points scored each time the person fires. 


The probability of scoring 10, 5, 3, or 0 points follows: 
1 area of 10 points 1 (1) 1 


10) = 
F020) 2 area of target 2 (5) ~—50 


1 areaof5 points 1 x(3)’—7(1)P_ 8 
2 areaoftarget 2 m(5)* 50 


f6) = 


Fig. 5-14 
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1 area of3 points 1 2(5)°— (3) _ 16 
2 areaoftarget 2 a(5)" 50 


fei 


0) = 5 


1 8 16 1 98 
Thus, E = 10( 3 3 o( 1.96. 
50 50 50 2 50 


A coin is weighted so that P(H) = p and hence P(T) =q=1-p. The coin is tossed until a 
head appears. 


Let E denote the expected number of tosses. Prove E = 1/p. 
That sample space is 


S = {H, TH, TTH, ..., T"H,..., T”} 


where T” denotes n tosses of H and T” denotes the case that heads never appears. (This would happen 
if the coin were two-headed.) Let X denote the number of tosses. Accordingly, X assumes the values 
1,2,3, ...,% with corresponding probabilities 


DOPE Doni GD, 224.0 


Thus E= > nq" 'p = A> ng’) 


where the sum is from 1 to ~. Let 


1 
t= 


The derivative with respect to q yields 


dy n—1 1 

Y= SS ng’! = ———{ 

dq > (l-qy 
Substituting this value of nq"! in the formula for E yields 


p pl 
(l=g pg 


E 


Note that calculus is used to evaluate the infinite series. 


Remark: This is an example of an infinite discrete sample space. 


A linear array EMPLOYEE has n elements. Suppose NAME appears randomly in the array, 
and there is a linear search to find the location K of NAME, that is, to find K such that 
EMPLOYEE|K|= NAME. Let f(n) denote the number of comparisons in the linear 
search. 

(a) Find the expected value of f(7). 

(b) Find the maximum value (worst case) of f(7). 


(a) Let X denote the number of comparisons. Since NAME can appear in any position in the array with 
the same probability of 1/n, we have X = 1, 2,3, ..., n, each with probability 1/n. Hence 


1 1 1 1 
f(n) = E(X) = 1+-4+2:-4+3+-+-++ tn: 
n n n n 
1 +1) 1 +1 
AA Gone co 
n 2 n 2 


(b) If NAME appears at the end of the array, then f(n) =n, which is the worst possible case. 
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MEAN, VARIANCE, AND STANDARD DEVIATION 


5.15. Find the mean m= E(X), variance o° = var(X), and standard deviation o = ox of each 


distribution: 
(a) x | 2 3 11 (b) x | 4 3 4 5 
fix) | 13 1/2 «146 f(x) | 04 O01 02 03 


Use the formulas, 


b= E(X) = x1 f(r) + X2f(%2) +++ + Xfm) = Dxif(%d) 
E(X?) = xi f(x1) + 3 f@2) +++ + x5, fem) = Dai fu) 


Then use the formulas 


o = var(X) = E(X*)— wand =o = oy = Vvar(X) 
to obtain o* = var(X) and o. 


(a) Use the above formulas to first obtain: 


w= Dx f(x) 2(5] +3(5) 11(<) 4 
E(X*) = 3x7 f(x) (3) (5) : (5) = 26 


Then o = var(X) = E(X’) — pw? = 26 — 16 = 10 
o = Vvar(X) = V10 = 3.2 
(b) Use the above formulas to first obtain: 
w= Dx:f(x, = 1(0.4) + 3(0.1) + 4(0.2) + 5(0.3) = 3 
E(X?) = Dx? f(x) = 1(0.4) + 9(0.1) + 16(0.2) + 25(0.3) = 12 
Then o = var(X) = E(X*) — pw? =12-9 =3 
o = Vvar(X) = V3 = 1.7 


5.16. Find the mean p= E(X), variance o* = var(X), and standard deviation o = oy of each 


distribution: 
(a) X; | -5 -4 1 y (b) X; | 1 3 5 7 
Di | 1/4 18 1/2 1/8 Di | 03 O1 04 £402 


Here the distribution is presented using x; and p; instead of x and f(x). The following are the 
analogous formulas: 


“at XmPm DX; 
Pm Dx? Dp; 


w= E(X) = xp + X2p2 
E(X’) = xip, + x3p. t+: +x 


Then, as before, 


o = var(X) = E(X*)- pw? and ——s 7 = oy = Vvar(X) 


onan =zan =f) 2) (9) ol) = 


E(X?) = Dx2p, 25(7) 16(5) i(5) +4(5) 9.25 
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5.18. 
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Then o = var(X) = E(X’) — p? = 9.25 — (—1)? = 8.25 
ao = Vvar(X) = V8.25 = 2.9 


(b) w= E(X) = Sxip; = 1(0.3) + 3(0.1) + 5(0.4) + 7(0.2) = 4.0 
E(X?) = Dxzp; = 170.3) + 320.1) + 5°(0.4) + 72(0.2) = 21.0 
Then o = var(X) = E(X’) — w? = 21 - (4 =5 


o = Vvar(X) = V5 = 2.24 


A fair die is tossed yielding the equiprobable space 
S = {1,2,3,4,5, 6} 


Let X denote twice the number appearing. Find the distribution f, mean px, variance ox, and 
standard deviation oy of X. 

Here X(1) =2, X(2)=4, X(3) =6, X(4) = 8, X(5) =10, X(6)=12. Also, each number has 
probability 1/6. Thus, the following is the distribution f of X: 


|e  # fe 12 


f(x) | 1/6 1/6 1/6 1/6 1/6 1/6 


Accordingly 
px = E(X) = Dxif(%) 


2(2) +4(2) +6(2) +8(2) +10(2) + n(2) 2-7 
E(X*) = Dx? fx) 


At) #906) 08) POE) Si) A ee 


Then ox = var(X) = E(X’) — pX = 60.7 — (7)* = 11.7 
ox = Vvar(X) = V11.7 = 3.4 


A fair die is tossed yielding the equiprobable space 

S = {1,2,3,4,5, 6} 
Let Y be 1 or 3 accordingly as an odd or even number appears. Find the distribution g, 
expectation py, variance oy, and standard deviation oy of Y. 


Here Y(1) =1, Y(2) =3, Y(3) =1, Y(4) =3, Y(5) =1, Y(6) =3. Then Ry = {1,3} is the range 
space of Y. Therefore 
3 


g(1) = P(Y = 1) = P({1,3, 5}) = ; = : and (3) = P(¥ = 3) = P((2,4,6}) = 2 = ; 


Thus, the distribution g of Y is as follows: 


7 


g(y) | 1/2 «12 
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5.19. 


5.20. 


ot 1 
Accordingly. py = BOY) = Eye) = 1(5) +3(5) =2 


2 
Then oy = var(Y) = E(Y’) — wy =5-(1)/ =1 
oy = Vvar(Y) = V1 =1 


E(Y*) = Lyig(y) = i(5) + 9(5) =5 


Let X and Y be the random variables in Problems 5.17 and 5.18 which are defined on the same 
sample space S. Recall that Z = X + Y is the random variable on S defined by 


Z(s) = X(s) + Y(s) 


Find the distribution, expectation, variance, and standard deviation of Z= X + Y. Also, verify 
that E(Z) = E(X + Y) = E(X) + E(Y). 


The sample space is still S = {1,2,3,4,5,6} and each sample point still has probability 1/6. Use 
Z(s) = X(s) + Y(s) and the values of X and Y from Problems 5.17 and 5.18 to obtain 


Z(1) = X(1) + Y(1) =24+1=3 Z(4) = X(4)+ ¥(4)= 8+3=11 
Z(2) = X(2)+ ¥(2)=44+3=7 2(5) = X(5) + (5) =104+1=11 
Z(3) = X(3) + ¥(3) =6+1=7 Z(6) = X(6) + Y(6) = 12 +3 = 15 


The range space of Z is Rz = (3,7, 11,15}. Also, 3 and 15 are each assumed at only one sample point and 
hence have a probability 1/6; whereas 7 and 11 are each assumed at two sample points and hence have a 
probability 2/6. Thus, the distribution of Z = X + Y is as follows: 


Zi | 3 7 11 15 


P(zi) | 16 2146 =2/6 ~~‘ 116 


Therefore bz = E(Z) = Xz: P(zi) 3(5) (2) (2) 13(<) = 9 


E(Z*) = Xz? P(zi) 9(=) 49(Z) 121() 225 (= = 22 = 95.7 


Then oy = var(Z) = E(Z*) — pz = 95.7 — (9) = 14.7 
Oz = Vvar(Z) = V14.7 = 3.8 
Moreover, E(Z) = E(X + Y) =9=7+2= E(X)+ E(Y). On the other hand, 
var(X) + var(Y) = 11.7 + 1 = 12.7 # var(Z) 


Let X andY be the random variable in Problems 5.17 and 5.18 which are defined on the same 
sample space S._ Recall that W = XY is the random variable on S defined by 


W(s) = X(s) + Y(s) 
Find the distribution, expectation, variance, and standard deviation of W = XY. 


The sample space is still § = {1,2,3,4,5,6} and each sample point still has a probability 1/6. Use 
W(s) = (XY)(s) = X(s)Y(s) and the values of X and Y from Problems 5.17 and 5.18 to obtain 
W(1) = X(1) YQ) = 2(1) = 2, W(4) = X(4)Y(4) = 8(3) = 24 
W(2) = X(2)Y(2) = 4(3) = 12, W(5) = X(5)Y(5) = 10(1) = 10 
W(3) = X(3)Y(3) = 6(1) = 6, W(6) = X(6)Y(6) = 12(3) = 36 
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Each value of W = XY is assumed at just one sample point; hence the distribution of W is as follows: 


Ww; pee 6 10 12 24 36 


P(w) | 1/6 1/6 1/6 1/6 1/6 1/6 


Therefore 
2 6 10 12 24 36 90 
,=E 7 P(w; + —4 15 
Hypa EWR DMPWye tet ttt 
4 36 100 144 576 1296 2156 
E(W?) = 2 P(w; 359.3 
= MCR ee ag ge ge oe 6 
Then ow = var(W) = E(W?) — uiy = 359.3 — (15)? = 134.3 


Ow = Vvar(W) = V134.3 = 11.6 
[Note: E(W) = E(XY) = 15 # E(X)E(Y) = 7(2) = 14] 


5.21. Let X be a random variable with distribution 


| 2. 2 


P(x) | 03 OS 0.2 


Find the mean px, variance oX%, and standard deviation oy of X. 
The formulas for 1x and E(X”) yield 
wx = E(X) = dx, P(x) = 1(0.3) + 2(0.5) + 3(0.2) = 1.9 
E(X?) = Dx? P(x) = 17(0.3) + 22(0.5) + 3°(0.2) = 4.1 
Then ox = var(X) = E(X’) — p? = 4.1 — (1.9)? = 0.49 
oy = Vvar(X) = V0.49 = 0.7 


5.22. Consider the random variables X in the preceding Problem 5.21. Find the distribution, 
mean py, variance oy, and standard deviation oy of the random variable Y = ®(X) where 
(a) B(x) = x°, (b) B(x) = 2%, (c) B(x) = 7 + 3x 4+ 4. 


The distribution of any arbitrary random variable Y = ®(X) where P(y) = P(x) is as follows: 


y | 0) 2) 9G) 


Py) | 03 05 ~~ 02 
a) Using 13 = 1, 2? = 8, 3° = 27, the distribution of Y = X° is as follows: 
g 


y 1 8 27 


Py) | 03 05 02 


Therefore 


py = E(Y) = 5 (x) P(x) = Sy:P(,) = 1(0.3) + 8(0.5) + 27(0.2) = 9.7 
E(Y?) = Dy?P(y,) = 1°(0.3) + 8°(0.5) + 277(0.2) = 178.1 
Then o% = var(Y) = E(Y?) — p2 = 178.1 — (9.7)? = 84.0 
oy = Vvar(Y) = V84.0 = 9.17 
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(b) Using 2' = 2, 2? = 4, 23 = 8, the distribution of Y = 2* is as follows: 


|e 


P(y) | 03 05 02 


Therefore py = E(Y) = dy; P(y,) = 2(0.3) + 4(0.5) + 8(0.2) = 4.2 
E(Y?) = Sy? P(y,) = 27(0.3) + 47(0.5) + 87(0.2) = 41.2 
Then oy = var(Y) = E(Y*) — w? = 41.2 — (4.2)? = 23.6 


oy = Vvar(Y) = V23.6 = 4.86 


(c) Substitute x = 1, 2, 3 in ®(x) = x? + 3x + 4 to obtain O(1) = 8, &(2) = 14, B(3) = 22. Then the 
distribution of Y = X? + 3X + 4 is as follows: 


ae 14 22 


P(y) | 03° “05-02 


Therefore py = E(Y) = Sy; P(y) = 8(0.3) + 14(0.5) + 22(0.2) = 13.9 
E(Y¥’) = Sy? P(y,) = 8°(0.3) + 147(0.5) + 227(0.2) = 214 
Then oy = var(Y) = E(Y’) — p’ = 214 — (13.9) = 20.8 


oy = Vvar(Y) = V20.8 = 4.56 


5.23. Let X be a random variable with distribution 


x | 1 3 5 7 


P(X =x) | 04 03 O02 O.1 


(a) Find the mean py, variance o%, and standard deviation ox of X. 


(b) Find the distribution of the standardized random variables Z = (X — )/o of X, and show 
that wz = 0 and oz = 1 (as predicted by Theorem 5.6). 


(a) The formulas for py and E(X”) yield 


wx = E(X) = Dx; P(x, = 1(0.4) + 3(0.3) + 5(0.2) + 7(0.1) = 3 
E(X?) = Dx? P(x,) = 12(0.4) + 3°(0.3) + 5°(0.2) + 7°(0.1) = 13 


Then oy = E(X’) - w = 13-3) =4 and Oy =2 


(b) Using Z = (X — p)/o = (x — 3)/2 and P(z) = P(x), we obtain the following distribution of Z: 


ana 0 1 yy) 


P(z) | 04 03 02 O41 


Therefore wz = Dz P(z;) = —1(0.4) + 0(0.3) + 1(0.2) + 2(0.1) = 0 
E(Z?) = Dz? P(z;) = (-1)°(0.4) + 0°(0.3) + 17(0.2) + 22(0.1) = 1 


Then oy = E(Z”)— pz =1-(0P =1 and oz=1 
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5.24. Let X denote the number of heads when a fair coin is tossed 4 times. By Problem 5.2, the 
mean wy = 2 and its distribution is as follows: 


fe tf Fe 


P(x) |1/16 4/16 6/16 4/16 1/16 


(a) Find the standard deviation oy of X. 


(b) Find the distribution of the standardized random variable Z = (X — )/o of X, and show 
that wz = 0 and oz = 1 (as predicted by Theorem 5.6). 


(a) First compute E(X°) as follows: 


B00) = Bate, =0(4) + v(4) +n(£) +9(4)+0(4)-2-s 


Using py = 2, we obtain 
ox = E(X*?)- ww =5-(2y=1 and Ox=1 
(b) Using Z = (X — p)/o = (x — 2)/1 and P(z) = P(x), we obtain the following distribution of Z: 


Zz | ~2 -1 0 1 2 


P(z) | 1n6 416 6/16 4/16 1/16 


Mz = 2zPCzi) 2(=) (=) (5) (5) +2(55] u a 


E(Z?) = D2? P(z;) 4(35] (=) (5) i(=) (5) = 1 


Then oy = E(Z”) — pz =1-(0 =1 and o7=1 


Therefore 


JOINT DISTRIBUTION AND INDEPENDENT RANDOM VARIABLES 
5.25. Let X and Y be random variables with joint distribution as shown in Fig. 5-15. 
(a) Find the distributions of X and Y. 
(b) Find cov(X, Y), the covariance of X and Y. 
(c) Find p(X, Y), the correlation of X and Y. 
(d) Are X and Y independent random variables? 


Fig. 5-15 
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5.26. 


(a) The marginal distribution on the right of the joint distribution is the distribution of X, and the 
marginal distribution on the bottom is the distribution of Y. Thus, the distributions of X and Y are 


as follows: 
x | 1 3 y | —3 2, 4 
f(x) | 0.5 0.5 g(y) | 0.4 0.3 0.3 
Distribution of X Distribution of Y 


(b) First compute py and py as follows: 
bx = xi f(x) = 1(0.5) + 3(0.5) = 2 
uy = Syig(y,) = —3(0.4) + 2(0.3) + 4(0.3) = 0.6 
Next compute E(XY) as follows: 


E(XY) = Vx fag) 
= 1(—3)(0.1) + 1(2)(0.2) + 1(4)(0.2) + 3(—3)(0.3) + 3(2)(0.1) + 3(4)(0.1) = 0 


Then cov(X, Y) = E(XY) — pxpy = 0 — 2(0.6) = 1.2 
(c) First compute oy as follows: 
E(X?) = Sx? f(x) = (0.5) + 3°(0.5) = 5 
ox = E(X’)— pe = 5-2 =1 and ox =1 


Next compute oy as follows: 


E(¥?) = Sy?g(y,) = (—3)2(0.4) + 27(0.3) + 42(0.3) = 9.6 
o} = E(Y?) — 2 = 9.6 — (0.6 = 9.24 and = oa, = 3.0 


cov(x,Y)  -12 _ 


Th X,Y) = 
. OG ge 10) 


0.4 


(d) X and Y are not independent since the entry A(1,-3)=0.1 is not equal to 
f(1)g(-3) = (0.5)(0.4) = 0.2, the product of its marginal entries, that is, 


P(X = 1, Y = —3) # P(X = 1)P(Y = —3) 


Let X and Y be independent random variables with the following distributions: 


Ds | if 2 y | 5 10 = 15 
fx) | 0.6 04 sy) | 0.2 05 03 
Distribution of X Distribution of Y 


Find the joint distribution h of X and Y. 


Since X and Y are independent, the joint distribution h can be obtained from the marginal 
distributions f and g. Specifically, first construct the joint distribution table with only the marginal 
distributions as shown in Fig. 5-16(a). Then multiply the marginal entries to obtain the interior entries, 
that is, set h(x,, y,) = f(x) gQ;). This yields the joint distribution of X and Y appearing in Fig. 5-16(b). 
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0.12 0.30 0.18 
0.08 0.20 012 


(a) (b) 


Fig. 5-16 


5.27. A fair coin is tossed 3 times yielding the following 8-element equiprobable space: 
S = {HHH, HHT, HTH, ATT, THH, THT, TTH, TTT} 


(Thus, each point in S occurs with probability 1/8.) Let X equal 0 or 1 accordingly as a head or 
a tail occurs on the first toss, and let Y equal the total number of heads that occurs. 

(a) Find the distribution f of X and the distribution g of Y. 

(b) Find the joint distribution h of X and Y. 

(c) Determine whether or not X and Y are independent. 

(d) Find cov(X, Y), the covariance of X and Y. 


(a) We have X(HHH) =0, X(HHT)=0, X(HTH)=0, X(HTT)=0 
X(THH) =1, X(THT)=1, X(TTH)=1, X(TTT)=1 
Also Y(HHH) =3, Y(HHT)=2, Y(HTH)=2, Y(HTT)=1 


Y(THH) =2, Y(THT)=1, Y(ITH)=1, Y(TTT)=0 


Thus, the distributions of X and Y are as follows: 


x | 0 1 y | 0 1 2 3 
fo) | 43 ols & & 3 
Distribution of X Distribution of Y 


(b) The joint distribution h of X and Y appears in Fig. 5-17. The entry (0, 2) is obtained using 
(0,2) = P(X = 0, Y = 2) = P({HTH, HHT}) = 2 


The other entries are obtained similarly. 


fool ieerea) we) 


roles) 


Fig. 5-17 
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X and Y are not independent. For example, the entry h(0, 0) = 0 is not equal to f(0)g(0) = 3-3, the 
product of the marginal entries. That is, 


P(X = 0, Y = 0) # P(X = 0)P(Y = 0) 


First compute wx, wy, and E(XY) as follows: 


ba = Exist) = 0(5) 


py = Xyighy) o(5) 


(3) 


E(XY) = Taxi fxg) = 10(5) 7 10)(5) | 


1 (5) 1 
2 2\2 4 


(5) +3(5) =3 


1 
terms with a factor 0 = 5 


Then 


cov(X, Y) = E(XY) — nxpby 


5.28. Let X and Y be the random variables in Problem 5.27. Recall that Z = X + Y is the random 
variable on the same sample space S defined by 


(a) 
(b) 
(c) 
(a) 


(>) 


(c) 


Z(s) = X(s) + Y(s) 
Find the distribution Z. 
Show that E(Z) = E(X + Y) = E(X) + E(Y). 
Find var(X), var(Y), and var(Z), and compare var(Z) with var(X) + var(Y). 


Here X assumes the values 0 and 1, and Y assumes the values 0, 1, 2, 3; hence Z can only 
assume the values 0, 1, 2, 3, 4. Hence the range space of Z is Rz = {0,1,2,3,4}. To find 
P(Z = z) = P(X + Y = 2), we add up the corresponding probabilities from the joint distribution of 
X and Y. For instance, 


P(Z = 3) 


P(X + Y= 3) = P(0,3) + P(1,2) =g+3=34 
Similarly, we obtain the following distribution of Z: 


ele Tt i: 2 Sg 


7 Oa ee ee ee ee 


[Since P(Z = 0) = 0 and P(Z = 4) = 0, we may delete the first and last entries in the distribution 
of Z.] 


From the distribution of Z we obtain 
1 
bz = E(Z) i(3) + 


From Problem 5.27, E(X) = py =5 
expected from Theorem 5.3). 


and E(Y) = wy =3. Hence E(Z) = E(X) + E(Y) (which is 


From the distributions of X, Y, Z we obtain 


sur)=0(2) +2(2) =} 
ser =o(2) +r(3) +2(3) +9(2) 43 
24 = (2) +2(2) +92) -2 
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1 1 1 
Therefore: var(X) = E(X’) — pk ( 
2 2 4 
var(Y) = E(Y*) — py =3 (3) 2 
My 5) 4 
9 1 
var(Z) = E(Z*) — pz 5 2 


Thus, var(Z) # var(X) + var(Y). (This may be expected since X and Y are not independent random 
variables.) 


5.29. A sample with replacement of size n = 2 is randomly selected from the numbers 1 to 5. This 
then yields the equiprobable space S consisting of all 25 ordered pairs (a,b) of numbers 
from 1 to 5. That is, 


S={(1,1, (1,2), .... (1,5), (2,1), ... (5,5)} 


Let X = O if the first number is even and _X = 1 otherwise; let Y = 1 if the second number is odd 
and Y = 0 otherwise. 


(a) Find the distributions of X and Y. 
(b) Find the joint distribution of X and Y. 
(c) Determine if X and Y are independent. 


(a) There are 10 sample points in which the first entry is even, that is, where 
a=2 or 4 and b =1, 2, 3, 4,5 


Thus, P(X = 0) = 10/25 = 0.4, and so P(X = 1) = 0.6. There are 15 sample points in which the 
second entry is odd, that is, where 


a=1,2,3,4,5 and b=1,3,5 


Thus, P(Y = 1) = 15/25 = 0.6, and so P(Y = 0) = 0.4. Therefore, the distributions of X and Y are 
as follows: 


x [oOo 1 y [oO 1 


P(x) | 04 0.6 P(y) | 04 0.6 


(Note that X and Y are identically distributed.) 
(b) For the joint distribution of X and Y, we have 
P(0, 0) = P(a even, b even) = P{(2, 2), (2, 4), (4,2), (4, 4)} = 4/25 = 0.16 
P(0, 1) = P(a even, b odd) = P{(2, 1), (2,3), (2,5), (4, 1), (4,3), (4, 5)} = 6/25 = 0.24 


Similarly P(1, 0) = 6/25 = 0.24 and P(1, 1) = 9/25 = 0.36. Thus, Fig. 5-18 gives the joint distribution 
of X and Y. 


0.16 0.24 


0.24 0.36 
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The product of the marginal entries do give the four interior entries; for example, 
P(0, 0) = 0.16 = (0.4)(0.4) = P(X = 0)P(Y = 0) 


Thus, X and Y are independent random variables, even though they are identically distributed. 


5.30. Let X and Y be the random variables in Problem 5.29, and let Z = X + Y. 


(a) Find the distribution Z. 
(b) Show that E(Z) = E(X + Y) = E(X) + E(Y). 


(c) 
(a) 


(>) 


(c) 


Find var(X), var(Y), and var(Z), and compare var(Z) with var(X) + var(Y). 


Here X assumes the values 0 and 1 and Y assumes the values 0 and 1; hence Z=X+ Y 
can only assume the values 0, 1, 2. Hence the range space of Z is Rz = {0,1,2}. To find 
P(Z = z) = P(X + Y = 2), we add up the corresponding probabilities from the joint distribution of 
X and Y. Thus 

P(Z = 0) = P(X = 0, Y = 0) = 0.16 

P(Z =1) = P(X =0, Y= 1) + P(X =1, Y = 0) = 0.24 + 0.24 = 0.48 

P(Z = 2) = P(X = 1, Y = 1) = 0.36 


Thus, the distribution of Z is as follows: 


ze 8 ty 2 


P(z) | 0.16 0.48 0.36 


From the distributions of X, Y, Z we obtain 
bey = E(X) = 0(0.4) + 1(0.6) = 0.6 by = E(Y) = 0(0.4) + 1(0.6) = 0.6 
bez = E(Z) = 0(0.16) + 1(0.48) + 2(0.36) = 1.2 
Hence E(Z) = E(X) + E(Y) (which is expected from Theorem 5.3). 
From the distributions of X, Y, Z we obtain 
E(X’) = 0°(0.4) + 1°(0.6) = 0.6, E(Y7) = 0°(0.4) + 17(0.6) = 0.6 
E(Z*) = 0°(0.16) + 17(0.48) + 27(0.36) = 1.92 
Accordingly, var(X) = E(X’) — px = 0.6 — (0.6)? = 0.24 
var(Y) = E(Y*) — py = 0.6 — (0.6)? = 0.24 
var(Z) = E(Z?) — pz = 1.92 — (1.2)? = 0.48 


Thus var(Z) = var(X) + var(Y). (This is expected since X and Y are independent random 
variables. ) 


5.31. Let X be the random variable with the following distribution, and let Y = X?: 


(4) 


ire: Saas 


fhe, & #3 


Find the distribution g of Y. 


(b) Find the joint distribution h of X and Y. 
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(c) Find cov(X, Y) and p(X, Y). 
(d) Determine whether or not X and Y are independent. 


(a) Since Y = X°, the random variable Y only has the values 4 and 1, and each occurs with probability 


t + t = ;. Thus, the distribution of Y is as follows: 


y | 1 4 
alt 
gy) | 3 3 


(b) The joint distribution h of X and Y appears in Fig. 5-19. Note that if X = —2, then Y= 4. There- 
fore 


A(-2,1)=0 and A(-2,4)=f( y=7 


The other entries are obtained in a similar manner. 


(c) First compute py, wy, and E(XY) as follows: 


ne sasesy=-2(2)-a(9) 9) 02) = 


by = Dyig(y) 1(5) +4(5) 


E(XY) = xy, flx)g(y)) (3) (7) | i(3) +8(;) : 


5 
Then cov(X, Y) = E(XY) — pxypy = 0 On, 0 and so p(X, Y) = 0 


(d) Xand Y are not independent. For example, the entry h(—2,1) = 0 is not equal to f(—2)g(1) = 3-4, 
the product of the marginal entries. That is, 


P(X = -2, Y =1) P(X = -2)P(Y = 1) 


Remark: Although X and Y are not independent and, in particular, Y is a function of X, this 
example shows that it is still possible for the covariance and correlation to be 0, which is always true when 
X and Y are independent. 
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5.32. Let X,, X2, X3; be independent random variables that are identically distributed with mean 
» = 100 and standard deviation o = 4. Let Y= (X, + X2 + X3)/3. Find: 


(a) the mean py of Y and (b) the standard deviation oy of Y. 
(a) Theorem 5.2 and Corollary 5.4 yield 


RIA BOS 
py = BY) = E[ 1 2 2) = (X, 2 3) 
3 3 
_ E(X,) + E(%) + EX) _ 100 + 100 + 100 _ 
3 3 


Note py = wp. 


(b) We use Theorem 5.6 and, since X;, X2, X3 are independent, we can also use Theorem 5.9 to 
obtain 


ea) Sak (= — _ var(X; + Xz + X3) 
- 3 9 


_ var(X,) + var(X2) + var(X3) _ 2?+2?+2? 12 4 
9 9 9 3 


ae 2 o 
fe = _ 
PMB Bi 138 
Remark: Suppose Y were the sum of m independent, identically distributed random variables with 
mean yw and standard deviation o. Then one can similarly show that 


Thus 


oO 
Hy =e and is aan 3 
n 


That is, the above result is true in general. 


CHEBYSHEV’S INEQUALITY 


5.33. Suppose a random variable X has mean sf =25 and standard deviation o=2. Use 
Chebyshev’s inequality to estimate: (a) P(X = 35) and (b) P(X = 20). 


(a) By Chebyshev’s inequality (Section 5.12), 


1 
Paw-kosSXSptko) = 1-3 
Substitute w=25, o=2 in wt+ko and solve the equation 25+2k=35 for k, getting 
k=5. Then 
1 : 1 rae 0.96 
ke 25 25 : 


Since  — ka = 25 — 10 = 15, Chebyshev’s inequality gives 
P(15 = X $35) = 0.96 


The event corresponding to X=35 contains as a subset the event corresponding to 
15= X =35. Therefore, 


P(X <35) = P(IS<X<35) = 0.96 


Hence, the probability that X is less than or equal to 35 is at least 96 percent. 
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(b) Substitute w= 25, o=2 in w—ko and solve the equation 25—2k=20 for k, getting 
k=2.5. Then 


1 
1-+=1-——=0.84 
ke 6.25 


Since p+ 20 = 25 + 5 = 30, Chebyshev’s inequality gives 
P20s5X 530) = 0.84 


The event corresponding to X=20 contains as a subset the event corresponding to 
20 = X =30. Therefore, 


P(X=20) = P20=X=30) = 0.84 
which says that the probability that X is greater than or equal to 20 is at least 84 percent. 


Remark: This problem illustrates that Chebyshev’s inequality can be used to estimate P(X = b) 
when b = yp, and to estimate P(X = a) whena Sp. 


5.34. Let X be a random variable with mean uw = 40 and standard deviation o = 5. Use Chebyshev’s 
inequality to find a value b for which P(40-b=X=40+b) = 0.95. 


1 
First solve 1 — Pa = 0.95 for k as follows: 


1 i 
0.05 = ee 
ke 0.05 


Then, by Chebyshev’s inequality, b = ko = 10/5 = 23.4. Hence, P(16.6 = X = 63.6) = 0.95. 


20 k = V20 =2V5 


5.35. Let X be a random variable with mean pw = 80 and unknown standard deviation o. Use 
Chebyshev’s inequality to find a value of o for which P(75 S X S 85) = 0.9. 
First solve 1 — - = 0.9 for k as follows: 


1 1 
0.1=5 or beet 10 or k=V10 


Now, since 75 is 5 units to the left of ~ = 80 and 85 is 5 units to the right of 4, we can solve either 
pw — ko =75 or 1+ ko = 85 for o. From the latter equation, we get 


5 
80+ V100=85 or o=—==1.58 
V 10 


MISCELLANEOUS PROBLEMS 
5.36. Let X be a continuous random variable with the following distribution: 


ax +k if0<x<k 


fe) = {h 


(a) Evaluate k. (b) Find PAS X $2). 


elsewhere 


(a) The graph of f is drawn in Fig. 5-20(a). Since f is a continuous probability function, the shaded 
region A must have areal. Note that A forms a trapezoid with parallel bases of length k and k + 4 
and altitude 3. Setting the area of A equal to 1 yields 


1 
{kt e+ 5)=1 or k=— 
2 2 
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0 1 2 3 


(a) Graph of f (b) POs X S2) = area of B 


Fig. 5-20 


(b) P(1 < XS 2) is equal to the area of B which is under the graph of f between x = 1 and x = 2, as 
shown in Fig. 5-20(b). Note: 


Dt ok: 3 1 1 5 

1lj)=—- + — = d 2) = — SS = 

IO ge, pe Ss I gag ap 
1/3 5 1 

Hence PAsXs2) = areaofB (1) = 
2\12 12 3 


5.37. Let X be the continuous random variable whose distribution function f forms an isosceles 
triangle above the unit interval I = [0,1] and 0 elsewhere (as pictured in Fig. 5-21). 


(a) Find k, the height of the triangle. (6) Find the formula which defines f. 
(c) Find the mean p = E(X) of X. 


(a) The shaded region A in Fig. 5-21 must have area 1. Hence 
(1)k=1 or k=2 
(b) Note that fis linear between x = 0 and x = 1/2 with slope m = (2/(1/2)) = 4, and fis linear between 
x = 1/2 and x = 1 with slope m = —4. Hence 


4x ifOsx=1/2 
f(x) = 4 -4x + 4 if1/2=x=1 
0 elsewhere 
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(c) Recall that we may view probability as weight or mass and the mean as the center of gravity. Since 
the triangle is symmetric, it is intuitively clear that 


mwas 
the midpoint of the base of the triangle between 0 and 1. We verify this mathematically using 
calculus: 


1/2 1 


w= E(X) = ij xf(x) dx = i x(4x) dx + [ x(—4x + 4) dx 
-» 0 12 
12 1 
= [ 4x? dx + [ (—4x? + 4x) dx 
0 12 


1/2 1 


= |e 13 


(1/6) + [(—4/3) + 2 + (1/6) — (1/2)] = 1/2 


+ [ary + 23° 


0 1/2 


5.38. Let h be the joint distribution of random variables X and Y. 


(a) Show that the distribution f of the sum Z=X+ Y can be obtained by summing the 
probabilities along the diagonal lines x + y = z,, that is, 


f= Dd) Altay) = DY AG ze - x) 
ZKEXIFYj Xj 
(b) Let X and Y be random variables whose joint distribution h appears in Fig. 5-22 (where 
the marginal entries have been omitted). Apply (a) to obtain the distribution f of the sum 
Z=X+Y. 


0.10 0 0.05 0.05 
0.05 010 O 0.05 
0.07 0.06 0.03 0.04 


Fig. 5-22 


(a) The events {X = x;, Y = yj; x; + y; = zx} are disjoint. Therefore 


fe=P(Z=xu)= >, P(X=x,Y=y)= DY) hay) = YaGu-—x) 
ZK=XitTYj ZKaXi TY; xj 
(b) Note first that Z = X + Y takes on all integer values between z = —2 (obtained when X = 0 and 
Y = —2) and z = 5 (obtained when X = 2 and Y = 3). Adding along the diagonal lines in Fig. 5-22 
(from lower left to upper right) yields 


f(—2) = 0.05, f(2) = 0.07 + 0.10 + 0.05 = 0.22 
f(—1) = 0.10 + 0.05 = 0.15, f(@) = 0.06 + 0 + 0.05 = 0.11 
f(0) = 0.03 + 0.05 + 0.10 = 0.18, f(4) = 0.03 + 0.05 = 0.08 
f(1) = 0.12 + 0.05 + 0 = 0.17, f(5) = 0.04 


Thus, the distribution f of Z = X + Y is as follows: 


z | 2 +2 09 1 2 38 4 5 


f(z) | 0.05 015 O18 O17 0.22 O11 0.08 0.04 
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5.39. Let X be a discrete random variable with distribution function f/ The rth moment M, of X is 
defined by 


Me BCS » xj f(xi) 


Find the first five moments of X if X has the following distribution: 


gal eee 3 


ce) | 03 0.5 02 


(Note that M, is the mean of X and M, is used in computing the variance and standard deviation 
of X.) 
Use the formula for M, to obtain 
M, = =x; flxi) 2(0.3) + 1(0.5) + 3(0.2) = 0.5 
Mz = Dx? f(xi) 2?7(0.3) + 17(0.5) + 37(0.2) = 3.5 
Ms; = Dx? f(xi) 23(0.3) + 19(0.5) + 3°(0.2) = 3.5 
M, = =x} f(x) 2*(0.3) + 14(0.5) + 34(0.2) = 21.5 
Ms = Dx? f(xi) 2°(0.3) + 1(0.5) + 3°(0.2) = 49.5 


5.40. Find the distribution function f of the continuous random variable X whose cumulative 
distribution function F follows: 


0 x<0 0 x<0O 
(a) Fx)=j)x O8fx=1 (b) F(x)=4sinx OSxS 7/2 
1 x>1 1 x > W/2 


Recall that F(x) = f * f(d dt. Calculus tells us that f(x) = F’(x), the derivative of F(x). Thus: 


0 x<0 
(a) f(x) = bs Osx<1 
0 x>1 
0 x<0 
(b) f(x) = fe 0O=x= 7/2 
0 x > 7/2 


PROOF OF THEOREMS 


Remark: In all proofs, X is a random variable with distribution f, Y is a random variable with 
distribution g, and h is their joint distribution. 


5.41. Prove Theorem 5.1. Let S be an equiprobable space, and let _X be a random variable on S$ with 
range space Ry = {x1, X2,..., x;}.. Then 


number of points in S whose image is x 
Pe = Px) = p aan 


number of points in S$ 
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Let S have n points and let 5,, 52, ...,5, be the points in S with image x; We wish to show that 
D; = f(x;) = rn. By definition, 


Pi = f(x;) = sum of the probabilities of the points in S whose image is x; 
P(s,) + P(s2) + +++ + P(s,) 


Since S is an equiprobable space, each of the n points in S has probability 1/n. Hence 


r times 


1 1 
all 1 ares aor ae 
non n 


5.42. Show that the marginal distributions, f(x;) = >;h(x;,,y;) and g(y;) = =;h(x,y;), are the 
(individual) distributions of X and Y. 
Let A;={X =x} and B,={Y = yj, that is, let A; = X~'(x;) and B; = Y~'(y,). Thus, the B; are 
disjoint and S = U;B;. Hence 
A; = A; N.S = A; (U;b)) = UA; B)) 
where the A;/M B; are also disjoint. Accordingly 


{2) = PL Ha) = PA) = > PAN B) 


J 
= > PX =%,Y=y) = Shes yj) 
The proof for g is similar. ’ ’ 


5.43. Prove Theorem 5.10. Let X and Y be random variables on S with Y= ®(X). Then 
E(Y) = >; ®(x;)) f(x;) where f is the distribution of X. 
(Proof is given for the case where X is discrete and finite.) 


Suppose X takes on the values x;,..., x, and ®(x;) takes on the values y;, ..., y, aS i runs 
from 1 ton. Then clearly the possible values of Y = ®(X) are y,,..., y, and the distribution 
g of Y is given by 


sv = >) fx) 


(EDX) =yj} 


E(Y) = pee = ys y > fe) 
j=l j=l { 


20CD=y) 


Therefore 


n 


= y fa) >) y= > fa) O(x) 


{7:P@i)=y)} i=l 


which proves the theorem. 


5.44. Prove Theorem 5.2. Let X be a random variable and let K be areal number. Then 
(i) E(kx) = kE(X). (ii) E(X +k) = E(X) +k. 


(Proof is given for the general discrete case and the assumption that E(X) and E(Y) both 
exist.) 


(i) Now kX = ®(X) where ®(x) = kx. Therefore, by Theorem 5.10 (Problem 5.43), 
E(kKX) = >, kx; f(x) = kd; x; f(x;) = KE(X) 
(ii) Here X + k = ®(X) where ®(x) =x+k. Therefore, using >; f(x;) = 1, 
E(X +k) = 34; + kK) f(x) = Vixifx;) + k Vi f(x) = E(X) +k 


www.ebook3000.com 


CHAP. 5] RANDOM VARIABLES 167 


5.45. Prove Theorem 5.3. Let XY and Y be random variables on S. Then 
E(X + Y) = E(X) + E(Y) 


(Proof is given for the general discrete case and the assumption that E(X) and E(Y) both 
exist.) 


Now X + Y = ®(X, Y) where B(x, y) =x + y. Therefore, by Theorem 5.10 (Problem 5.43), 
E(X+Y¥)= >) >) Ga + yarn y) 
ij 
= > Dh yi) + > Dh yi) 
By Problem 5.42, f(x;) = =,h(x;, y;) and ey) = h(x; yj). ‘hie 


E(X + ¥)= Dd) xifx) + >) 80) = BOO + BY) 


5.46. Prove Theorem 5.6. var(aX + b) = anvar(X). 


We prove separately that: (i) var(X + k) = var(X), and (ii) var(KX) = k’var(X), from which the 
theorem follows. By Theorem 5.2, wy, = wy t+ k and ppy =kpy. Also, x; f(x) = wx and =f(x,) = 1. 


Therefore 
var(X + k) = X(x; + k) fx) — wire 
= Dx? f(x;) + 2kDx; fx) + kD fx) — (ux + ky’ 
= Dx? fai) + 2kwx +k? — (ux + 2kpy + WP) 
= Sx, fl) — 10 = var(X) 
and 


var(eX) = ¥ (ex,)? fle) — py = D2 fle) — (leper)? 
= Dx? f(x) — ux = (2x? fx) — wx) = kK’ var(X) 


5.47. Show that 
cov(X, Y) = y (xi — Bx); — By) Ai, y)) = Y' iyh(xi, yj) — Bxby 
ij ii 


(Note that the last term is E(XY) — py py.) 


(Proof is given for the case that X and Y are discrete and finite.) 
We have 


DwhGay)= Dye0) =o Dp xshleny) = Daf =a DD hGay) = 1 


ij i 


Therefore 
1 6 = wx)0y- y)ACes y)) 
i,j 


= SS (Xi¥j — MexYj — ByXi + bx My) A(x, y)) 


ij 
= Si xyjhx, yj) — Bx Sih yi) ~ hy > xjA(Xi, yj) + ine), A(xi, Yj) 
i,j ij i,j ij 


= DS) xyh@s y)) — px bby — bax thy + bx bey 
ij 


= Si xyjhx, yj) — bx by 
ij 
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5.48. Prove Theorem 5.7. The standardized random variable Z has mean pz = 0 and standard 
deviation o7 = 1. 

—H 

Oo 


By definition Z = = 


where X has mean p and standard deviation o>0. Using E(X) = p and 


Theorem 5.2, we get 


pe = B(——*) = 2(2 “) ~£(X) Boh big 


oO Oo oO 


Also, using Theorem 5.6, we get 


7 X-p X wp 1 _ o = 
var(Z) var - var( = Y = var(X) = 1 
Therefore, oz = Vvar(Z) = Vi =1. 


5.49. Prove Theorem 5.8. Let X and Y be independent random variables on $. Then: 
(i) E(XY) = E(X)E(Y). 
(ii) var(X + Y) = var(X) + var(Y). 
(iii) cov(X, Y) = 0. 


(Proof is given for the case when X and Y are discrete and finite.) 
Since X and Y are independent, h(x;, y;) = f(x)g(y;). Thus 


E(XY) = >) uyhoxny) = >, xy fag) 


ij ij 


= > xf) >) yjg(y)) = E(X)E(Y) 


and 


cov(X, Y) = E(XY) — pxby = E(X)E(Y) — pxpy = 0 
In order to prove (ii) we also need 


Mx+y = bx + by, Si Phx:, y) = S x; fx), > yp h(x, yi) = » ypa(y)) 
i 


ij i ioj 


Hence 


var(X + Y)= >) (x; + y)PA@a y) — wey 


ij 


= > x7 A(x; y)) + 2») xiysh(x;, yj) + > yp (xi, Y)) — (Mex + by)? 


= > x7 fli) + a> xifx) >, yi8(yj) + > Yi 8(y)) — wx — 2ex by — By 
= D>) xP fles) — wk + DY yi) ~ wh = var(X) + var(¥) 


5.50. Prove Theorem 5.9. Let X;, Xz, ..., X,, be independent random variables on S$. Then 
var(X, + X> +--+ + X,) = var(X1) + var(X2) +--+ + var(X,,) 


(Proof is given for the case when X\, X3, ..., X,, are all discrete and finite.) 
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We take for granted the analogs of Problem 5.49 and Theorem 5.11 for n random variables. Then 


var(X, + +++ + X,) = E(X, + +++ + Xn = px s-+x,)) 


= > (x1 oT Xn Hx, t—+x,) AO, Bers Xn) 


= a (x te $x by, m1 Bx, PAO, «+s Xn) 
= DD Dt DD senses — 2D mat fis) 
s J ij Esty 
where / is the joint distribution of X,,..., X,, and y,4+..4%, = Mx, + +++ + wy, Since the X; are pairwise 


independent, 2 x;x;h(x%1, ..., X,) = bx, bx, fori #j. Hence 


var(X) + +++ + X,) = SS ex bexy a Se) sf > > ae n= 2) SS ext 
1 i 9 ar; 


iFj i= 


n 


- SS E(X?) - S (ux, = >) var(X,) = var(X,) +++ + var(X,) 


i=1 


as required. 


5.51. Prove Theorem 5.12 (Chebyshev’s Inequality). For any k > 0, 
1 
Pp ko =X Sp +ko)=1— se 


Note first that 
P(X — p|>ko) =1—- P(|X — p| <ko) =1-P(Qu-—ko = XS p+ ko) 
By definition 
o = var(X) = Sai - wy’: 


Delete all terms from the summation for which x; is in the interval [u — ko, w + ko], that is, delete all 
terms for which |x;— 4|<ko. Denote the summation of the remaining terms by =*(x; — )*p;. Then 


c= =*(a; b)’ Pi = L*k? op, = ko X* p; 
=khoeP(\X — p| > ko) 
=RP[l- Pw ko =X <p + ko) 


If o> 0, then dividing by k*o* gives 


dL: 
pzi- Pu ko=X=p+ ko) 


or 


Paw-kosSXSptko) = 1-7 
which proves Chebyshev’s inequality for a >0. If o = 0, then x; = yp for all p; > 0, and 


1 
i? 


Piw-k-0SXSpt+k-0) P(X = p) 1>1- 


which completes the proof. 
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5.52. Let X;, X2,..., X, be n independent and identically distributed random variables, each with 
mean p and variance o”, and let X,, be the sample mean, that is, 


gp kit hte t%, 


n 


n 

(a) Prove that the mean of X,, is w and the variance is o7/n. 

(b) Prove Theorem 5.13 (weak law of large numbers): For any a > 0, 
P(w-asX,Spt+ay>1 as no 

(a) Using Theorems 5.2 and 5.3, we get 


= Xt Xo +++: +X, 1 
ps, = EC%) = E| ; 7 Eh t Xy+++++ X,) 
1 np 
= [E(X,) + E(X2) + +++ + E(X,,)] 7 
Now using Theorems 5.3 and 5.9, we get 
= Xt Xt: +X, 1 
var(X,,) = var , z > = var(X, + X,+---+X,) 
1 c oC 
=  [var(X,) + var(X2) +--+ + var(X,,)] = ia =— 
n n n 


(b) The proof is based on an application of Chebyshev’s inequality to the random variable X,,. First 
note that by making the substitution ko = a, Chebyshev’s inequality can be written as 


o 
Pw-as=X=pt+a) = 1-]% 
a 


Applying Chebyshev’s inequality in the form above, we get 


= o 
Pw-a=X,=uta) = 1-—, 
no 


from which the desired result follows. 


Supplementary Problems 


RANDOM VARIABLES AND EXPECTED VALUE 


5.53. Suppose a random variable X takes on the values —4, 2, 3, 7 with respective probabilities 


k+2 2k —3 3k —4 k+1 
10” 10 ° 10” 10 


Find the distribution and expected value of X. 


5.54. A pair of dice is thrown. Let X denote the minimum of the two numbers which occur. Find the 
distribution and expectation of X. 


5.55. A fair coin is tossed 4 times. Let Y denote the longest string of heads. Find the distribution and 
expectation of Y. (Compare with the random variable X in Problem 5.2.) 
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5.56. 


5.57. 


5.58. 


5.59. 


5.60. 


5.61. 


5.62. 


5.63. 


5.64. 


A coin, weighted so that P(H) = 3/4 and P(T) = 1/4, is tossed 3 times. Let X denote the number of heads 
that appear. (a) Find the distribution of X. (b) Find E(X). 


A coin, weighted so that P(H) = 1/3 and P(T) = 2/3, is tossed until 1 head or 5 tails occur. Find the 
expected number EF of tosses of the coin. 


The probability of team A winning any game is 1/2. Suppose A plays B ina tournament. The first team 
to win 2 games in a row or 3 games wins the tournament. Find the expected number EF of games in the 
tournament. 


A box contains 10 transistors of which 2 are defective. A transistor is selected from the box and tested until 
a nondefective one is chosen. Find the expected number E£ of transistors to be chosen. 


Solve the preceding Problem 5.59 for the case when 3 of the 10 items are defective. 


Five cards are numbered 1 to 5. Two cards are drawn at random (without replacement). Let X denote 
the sum of the numbers drawn. (a) Find the distribution of X. (b) Find E(x). 


A lottery with 500 tickets gives 1 prize of $100, 3 prizes of $50 each, and 5 prizes of $25 each. (a) Find 
the expected winnings of a ticket. (b) If a ticket costs $1, what is the expected value of the game? 


A player tosses 3 fair coins. The player wins $5 if 3 heads occur, $3 if two heads occur, and $1 if only 1 
head occurs. On the other hand, the player loses $15 if 3 tails occur. Find the value of the game to the 
player. 


A player tosses 2 fair coins. The player wins $3 if 2 heads occur and $1 if 1 head occurs. For the game 
to be fair, how much should the player lose if no heads occur? 


MEAN, VARIANCE, AND STANDARD DEVIATION 


5.65. 


5.66. 


5.67. 


5.68. 


Find the mean p, variance o”, and standard deviation o of each distribution: 


(a) x | 2 3 8 (b) x Mee -1 7 


fx) | 1/4 «1/2 1/4 fx) | 13 12 16 


Find the mean p, variance o”, and standard deviation o of each distribution: 


(a) y ieaks 200 1 2 3 (b) x [4 2. Br 6. 4 


fx) | 02 Odi O08! 02 fix) | 02 O1 03 O14 03 


Let X be a random variable with the following distribution: 


x | 1 3 4 5 


fx) | 04 O01 02 03 


Find the mean p, variance o”, and standard deviation o of X. 


Let X be the random variable in Problem 5.67. Find the distribution, mean yw, variance o*, and standard 
deviation o of each random variable Y: (a) Y = 3X + 2, (b) Y = X°, (c) Y = 2%. 
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5.69. 


5.70. 


5.71. 


5.72. 


5.73. 
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Let X be a random variable with the following distribution: 


a 1 2 


fx) | 02 05 03 


Find the mean p, variance o”, and standard deviation o of X. 


Let X be the random variable in Problem 5.69. Find the distribution, mean yw, variance o*, and standard 
deviation o of the random variable Y = ®(X) where 


(a) (x) = x", (b) ®(x) = 3%, (c) (x) =2**! 


Find the mean yp, variance o*, and standard deviation o of the following two-point distribution where 
ptq=l: 


x | a b 


f(x) | P q 


Show that oy = 0 if and only if X is a constant function, that is, X(s) =k for every s€S or simply 
X=k. 


Two cards are selected from a box which contains 5 cards numbered 1, 1, 2, 2, and 3. Let X denote the 
sum and Y the maximum of the 2 numbers drawn. Find the distribution, mean, variance, and standard 
deviation of the random variables: (a) X, (b) Y, (c) Z=X+ Y, (d) W= XY. 


JOINT DISTRIBUTIONS, INDEPENDENT RANDOM VARIABLES 


5.74. 


5.75. 


5.76. 


Consider the joint distribution of X and Y in Fig. 5-23(a). Find: (a) E(X) and E(Y), (b) cov(X, Y), 
(c) ox, oy, and p(X, Y). 


Consider the joint distribution of X and Y in Fig. 5-23(b). Find: (a) E(X) and E(Y), (b) cov(X, Y), 
(c) Ox, Dy, and P(X, Y). 


(b) 


Fig. 5-23 


Suppose X and Y are independent random variables with the following respective distributions: 


x [1 2 y | 2 5 8 


fa) | 07 03 ay) | 03 05 02 
Find the joint distribution 4 of X and Y, and verify that cov(X, Y) = 0. 
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5.77. 


5.78. 


5.79. 


5.80. 


Consider the joint distribution of X and Y in Fig. 5-24(a). (a) Find E(X) and E(Y). (b) Determine 
whether X and Y are independent. (c) Find cov(X, Y). 


Consider the joint distribution of X and Y in Fig. 5-24(b). (a) Find E(X) and E(Y). (b) Determine 
whether X and Y are independent. (c) Find the distribution, mean, and standard deviation of the random 
variable Z = X + Y. 


0 0.05 
0.10 0 
0.06 0.03 


3/8 5/8 1/4 


0.16 0.08 


(a) 
Fig. 5-24 


A fair coin is tossed 4 times. Let X denote the number of heads occurring, and let Y denote the longest 
string of heads occurring. (See Problems 5.2 and 5.55.) 

(a) Determine the joint distribution of X and Y. 

(b) Find cov(X, Y) and p(X, Y). 


Two cards are selected at random from a box which contains 5 cards numbered 1, 1, 2, 2, and 3. Let X 
denote the sum and Y the maximum of the 2 numbers drawn. (See Problem 5.73.) (a) Determine the 
joint distribution of X and Y. (b) Find cov(X, Y) and p(X, Y). 


CHEBYSHEV’S INEQUALITY 


5.81. 


5.82. 


5.83. 


5.84. 


5.85. 


Let X be a random variable with mean mw and standard deviation 0. Use Chebyshev’s inequality to 
estimate P(u - 30 5 X Sp + 30). 


Let Z be the standard normal random variable with mean » = 0 and standard deviation o=1. Use 
Chebyshev’s inequality to find a value b for which P(-b = Z=b) = 0.9. 


Let X be a random variable with mean =0 and standard deviation o = 1.5. Use Chebyshev’s 
inequality to estimate P(—3 = X S 3). 


Let X be a random variable with mean = 70. For what value of o will Chebyshev’s inequality yield 
P(65S X75) = 0.95? 


Let X be a random variable with mean wu = 100 and standard deviation o = 10. Use Chebyshev’s 
inequality to estimate: (a) P(X = 120) and (b) P(X $75). 


MISCELLANEOUS PROBLEMS 


5.86. 


Let X be a continuous random variable with the following distribution: 


1/8 if0<x=<8 
fx) = {9 


Find: (a) P(2 = X <5), (b) PB =X $7), (c) P(X= 6). 


elsewhere 
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5.87. 


5.88. 


5.89. 


5.90. 


5.91. 


5.92. 
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Determine and plot the graph of the cumulative distribution function F of the random variable X in 
Problem 5.86. 


Let X be a continuous random variable with the following distribution: 


ifO0s<x=<5 


elsewhere 


kx 
fe) = {9 
Evaluate k and find: (a) Pa=X 3), (b) P(2sX4), (c) P(X S3). 


Plot the graph of the cumulative distribution function F of the discrete random variable X with the 
following distribution: 


x | -3 2 6 


fx) | 4 172 41/4 


Find the distribution function f(x) of the continuous random variable X whose cumulative distribution 
function F follows: 


0 ifx<0 0 ifx<0 
(a) F(x) = 4x if0<x<1 (b) F(x) =fsinax if0<x<1/2 
1 ifx>1 1 if x > 1/2 


[Hint: f(x) = F’(x), the derivative of F(x), wherever it exists.] 
Let X be a random variable for which 0, #0. Show that p(X, X) = 1 and p(X, —X) = -1. 


Prove Theorem 5.11. Let X, Y, Z be random variables on S with Z = ®(X, Y). Then 
E(Z) = >) O(a; yh yi) 


ij 


where hi is the joint distribution of X and Y. 


Answers to Supplementary Problems 


The following notation will be used: 


5.53. 


5.54, 


5.55. 


5.56. 


5.57. 


5.58. 


[X1, -- +) Xi f(%1), ---, f(x,,)] for the distribution f = {(x,, f(x,)}; 
[x3 y row by row] for the joint distribution h = {[(x, y,), h(x; y,)]} 


k=2; [-4, 2,3, 7; 0.4, 0.1, 0.2, 0.3]; E(X) = 1.3. 

[1, 2, 3, 4, 5, 6; 11/36, 9/36, 7/36, 5/36, 3/36. 1/36]; E(X) = 91/36 ~ 2.5. 
[0, 1, 2, 3, 4; 1/16, 7/16, 5/16, 2/16, 1/16]; E(X) = 27/16 ~ 1.7. 

(a) [0, 1, 2, 3; 1/64, 9/64, 27/64, 27/64]; (b) E(X) = 2.25. 

E = 211/81 ~ 2.6. 


E = 23/8 ~ 2.9. 
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5.59. E=11/9~1.2. 
5.60. E=11/8~1.4. 

5.61. (a) [3, 4,5, ..., 9; 0.1, 0.1, 0.2, 0.2, 0.2, 0.1, 0.1]; (b) E(X) = 6. 
5.62. (a) 0.75; (b) —0.25. 

5.63. 0.25. 

5.64. $5. 


5.65. (a) w= 4, o =5.5, o = 2.3; (b) w=0, © = 10, o = 3.2. 


5.66. (a) w=1, 0% =2.4, o = 1.5; (b) w = 4.0, o = 5.6, o = 2.37. 
5.67. py = 3, 0% = 3, oy = V3 = 17. 
5.68. (a) [5, 11, 14, 17; 0.4, 0.1, 0.2, 0.3], wy = 11, of = 27, oy ~ 5.2; 
(b) [1, 9, 16, 25; 0.4, 0.1, 0.2, 0.3], wy = 12, 0% = 103.2, oy = 10.2; 
8 
5.69. wy = 0.9; o% = 1.09; oy = 1.04. 
5.70. (a) [1, 1, 16; 0.2, 0.5, 0.3], wy = 5.5, oF = 47.25, oy = 6.87; 


(b) [1/3, 3, 9; 0.2, 0.5, 0.3], wy = 4.67, 0% = 5.21, oy = 2.28; 
(c) [1, 2, 8; 0.2, 0.5, 0.3], wy = 3.6, 0% = 8.44, oy = 2.91. 


5.71. o =ap+bq; 0 =pq(a— by; 0=|a—b|Vpq. 


5.73. (a) [2,3, 4,5; 0.1, 0.4, 0.3, 0.2], wy = 3.6, o% = 0.84, oy = 0.9; 
(b) [1, 2, 3; 0.1, 0.5, 0.4], wy = 2.3, oF = 0.41, oy = 0.64; 
(c) [3, 5, 6, 7, 8; 0.1, 0.4, 0.1, 0.2, 0.2], wz = 5.9, 0% = 2.3, oz = 1.5; 
(d) [2, 6, 8, 12, 15; 0.1, 0.4, 0.1, 0.2, 0.2], uw = 8.8, of = 17.6, ow = 4.2. 


5.74. (a) E(X) = 3, E(Y) = 1; (b) cov(X, Y) = 1.5; (c) oy = 2, oy = 4.3, p(X, Y) = 0.17. 
5.75. (a) E(X) = 14, E(Y) = 1; (b) cov(X, Y) = —0.5; (c) ox = 0.49, oy = 3.1, p(X, Y) = 0.3. 
5.76. [1, 2; —2, 5, 8; 0.21, 0.35, 0.14; 0.09, 0.15, 0.06]. 
5.77. (a) E(X) = 1.7, E(Y) = 3.1; (b) yes; (c) must equal 0 since X and Y are independent. 
5.78. (a) E(X) = 1.05, E(Y) = 0.16; 

(b) no; 

(c) [-2, -1, 0, 1, 2, 3, 4, 5; 0.05, 0.15, 0.18, 0.17, 0.22, 0.11, 0.08, 0.04], wz = 1.21, oz = V3.21 ~ 1.79. 
5.79. (a) [0, 1, 2, 3, 4; 0, 1, 2, 3, 4; 1/16, 0, 0, 0, 0; 0, 4/16, 0, 0, 0; 0, 3/16, 3/16, 0, 0; 0, 0, 2/16, 2/16; 

0, 0, 0, 0, 1/16]; 

(b) cov(X, Y) = 0.85, p(X, Y) = 0.89. 

5.80. (a) [2, 3, 4, 5; 1, 2, 3; 0.1, 0, 0; 0, 0.4, 0; 0, 0.1, 0.2; 0, 0, 0.2]; (b) cov(X, Y) = 0.52, p(X, Y) = 0.9. 


581. P=1-4~0.89. 
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5.82. 


5.83. 


5.84. 


5.85. 


5.86. 


5.87. 


5.88. 


5.89. 


5.90. 
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b=V10 = 3.16. 
P=0.75. 


o = 5/\V/20 ~ 1.12. 

(a) P= 0.75; (b) P = 0.84. 

(a) 3/8; (b) 1/2; (c) 1/4. 

F(x) is equal to: 0 if x <0, x/8 if 0 =x <8, and 1 if x > 8. See Fig. 5-25(a). 
k = 2/25: (a) 8/25; (b) 12/25; (c) 9/25. 

See Fig. 5-25(b). 


(a) f(x) = 5x* between 0 and 1 and f(x) = 0 elsewhere; 
(b) f(x) = wcosx between 0 and 1/2 and f(x) = 0 elsewhere. 
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CHAPTER 6 


Binomial 

and 

Normal 
Distributions 


6.1 INTRODUCTION 


The last chapter defined a random variable X on a probability space S and its probability 
distribution f(x). It was observed that one can discuss X and f(x) without referring to the original 
probability space S. In fact, there are many applications of probability theory which give rise to the 
same probability distribution. This chapter mainly discusses two such important distributions in 
probability—the binomial distribution and the normal distribution. In addition, we will also briefly 
discuss other distributions, including the Poisson and multinomial distributions. Furthermore, we 
indicate how each distribution might be an appropriate probability model for some applications. 

The central limit theorem, which plays a major role in probability theory, is also discussed in this 
chapter. This theorem may be viewed as a generalization of the fact that the discrete binomial 
distribution may be approximated by a continuous normal distribution. 


6.2 BERNOULLI TRIALS, BINOMIAL DISTRIBUTION 


Consider an experiment ¢ with only two outcomes, one called success (S$) and the other called 
failure (F). (We will let p denote the probability of success in such an experiment and let g = 1 — p 
denote the probability of failure.) Suppose the experiment « is repeated and suppose the trials are 
independent, that is, suppose the outcome of any trial does not depend on any previous outcomes, such 
as tossing a coin. Such independent repeated trials of an experiment with two outcomes are called 
Bernoulli trials, named after the Swiss mathematician Jacob Bernoulli (1654-1705). 
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A binomial experiment consists of a fixed number, say n, of Bernoulli trials. (The term 
“binomial” comes from Theorem 6.1 below.) Such a binomial experiment will be denoted by 


B(n,p) 


That is, B(n, p) denotes a binomial experiment with n trials and probability p of success. 

Frequently, we are interested in the probability of a certain number of successes in a binomial 
experiment and not necessarily in the order in which they occur. The following theorem (proved in 
Problem 6.11) applies. 


Theorem 6.1: The probability of exactly k success in a binomial experiment B(n, p) is given by 
P(k) = P(k successes) = i pag 
The probability of one or more successes is 1 — q”. 


Here (i) is the binomial coefficient which is defined and discussed in Chapter 2. Recall that 
C(n, k) is also used for the binomial coefficient. Accordingly, we may alternately write 


P(k) = C(n, k) p* q"* 


Observe that g” denotes the probability of no successes, and hence 1 — q” denotes the probability 
of one or more successes. Moreover, the probability of getting at least k successes, that is, k or more 
successes, is given by 


P(k) + P(k +1) + P(k +2) +-+++ + P(n) 


This follows from the fact that the events of getting k and k’ successes are disjoint for k # k’. 


EXAMPLE 6.1 The probability that Ann hits a target at any time is p = }; hence she misses with probability 


q=1-—p =. Suppose she fires at the target 7 times. This is a binomial experiment with n=7 and 


p =4. Find the probability that she hits the target: (a) Exactly 3 times. (b) At least 1 time. 


(a) Here k =3 and hencen—k=4. By Theorem 6.1, the probability that she hits the target 3 times is 


rov=()(8) Som 


(b) The probability that she never hits the target, that is, all failures, is: 


2\7 128 
P(0)=q'= = = 0.06 
O=¢ (5) 2187 
Thus, the probability that she hit the target at least once is 
2059 
1-q’= = 0.94 = 94% 
1 9187 . 


EXAMPLE 6.2 A fair coin is tossed 6 times; call heads a success. This is a binomial experiment with n = 6 and 
p =q=4. Find the probability that: (a2) Exactly 2 heads occur. (b) At least 4 heads occur. (c) At least 1 head 
occurs. 


(a) Here k=2 and hence n—k=4. Theorem 6.1 tells us that the probability that exactly 2 heads occur 


follows: 
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(b) The probability of getting at least 4 heads, that is, where k = 4, 5 or 6, follows: 


rea +s) 0 = (4)(5) (3) *(5)(3) (2) + (6)(—) 


15 6 1 22 
= 0.34 
64 64 64 64 


(c) The probability of getting no heads (that is, all failures) is g° = (1/2)° = 1/64, so the probability of 1 or more 
heads is 


1-g"=1-%=8~ 0.94 


Binomial Distribution 


Consider a binomial experiment B(n, p). That is, B(n, p) consists of n independent repeated trials 
with two outcomes, success or failure, and p is the probability of success and g=1-—p is the 
probability of failure. Let X denote the number of successes in such an experiment. Then X is a 
random variable with the following distribution: 


k [0 fi 2 ht 


n - n na a 
P(k) le @u 'p (3 )a ep cette. SB 


This distribution for a binomial experiment B(n,p) is called the binomial distribution since it 
corresponds to the successive terms of the following binomial expansion: 


n A n n— n tee n 
(q+p)"=4 +({Ja 'p+(5)a ap hector 
Thus, B(n, p) will also be used to denote the above binomial distribution. 
EXAMPLE 6.3 Suppose a fair coin is tossed 6 times, and heads is called a success. This is a binomial experiment 
with n = 6 and p=q =3. By Example 6.2, 

PQ) = & P(4) = PS) = & P(6) = 5 
Similarly, 

PO) = & PQ) = 5% PG) =e 

Thus, the binomial distribution B(6, 3) follows: 


BO She Oe ee Oe SF 


1 6 15 20 15 6 1 
P(k) | 64 64 64 64 64 64 64 


Properties of the binomial distribution follow: 


Theorem 6.2: 


Binomial distribution B(n, p) 


Mean or expected number of successes 
Variance 
Standard deviation 


This theorem is proved in Problem 6.18. 
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EXAMPLE 6.4 


a) The probability that John hits a target isp = 1/4. He fires 100 times. Find the expected number p of times 
Pp y g p 
he will hit the target and the standard deviation o. 
Here p = 1/4 and so q = 3/4. Hence 


w=np=100-4;=25 and o = Vapq = V100-}4-4= 2.5 


(b) A fair die is tossed 180 times. Find the expected number yp of times the face 6 will appear and the standard 
deviation o. 
Here p = 1/6 and so q = 5/6. Hence 


pw =np =180-4=30 and o= Vapg = V180-%-2=5 


(c) Find the expected number E(X) of correct answers obtained by guessing in a 30-question true-false test. 
Here p =}. Hence E(X) = np = 30-4 = 15. 


6.3 NORMAL DISTRIBUTION 


Let X be a random variable on an infinite sample space S where, by definition, {a = X = b} is an 
event in S. Recall (Section 5.10) that X is said to be continuous if there is a function f(x) defined on 
the real line R = (—%, ©) such that 

(i) fis nonnegative. 
(ii) The area under the curve of f is one. 
(iii) The probability that X lies in the interval [a,b] is equal to the area under f between 
x =aandx=b. 


These properties may be restated as follows where we use the language of calculus for the area under 
a curve: 


(i) f(x)=0, (ii) [ f(x) dx = 1, (iii) P(iasX<b)= [re dx 


The function f(x) is called the probability density function or, simply, distribution of X. 
Furthermore, the expected value 4 = E(X) and the variance var(X) of a continuous random 
variable X, with density function f(x), is defined by the integrals 


E(X) = [ xf(x)dx and ~~ var(X) = [ (x — pw) f(x) dx 


As in the case of a discrete random variable, the standard deviation o of X is the nonnegative square 
root of var(X). 


Normal Random Variable 


The most important example of a continuous random variable X is the normal random variable, 
whose density function has the familar bell-shaped curve. This distribution was discovered by De 
Moive in 1733 as the limiting form of the binomial distribution. Although the normal distribution is 
sometimes called the “Gaussian distribution” after Gauss who discussed it in 1809, it was actually 
already known in 1774 by LaPlace. 

Formally, a random variable X is said to be normally distributed if its density function f(x) has the 


following form: 
1 1/x—-p | 
x)= ex 
Tee | \ o 


where p is any real number and @ is any positive number. 
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The above distribution, which depends on the parameters and o, will be denoted by 
N(u, 0°) 


Thus, we say that X is N(y, 0”) if the above function f(x) is its distribution. 

The two diagrams in Fig. 6-1 show the changes in the bell-shaped normal curves as w and o 
vary. Specifically, Fig. 6-1(a) shows the distribution for three values of « and a constant value of o. 
In Fig. 6-1(5) » is constant and three values of o are used. 

Observe that each curve in Fig. 6-1 reaches its highest point at x = yw, and that the curve is 
symmetric about x = yw. The inflection points, where the direction of the bend of the curve changes, 
occur when x = »+ o and x = w-o. Furthermore, although the distribution is defined for all real 
numbers, the probability of any large deviation from the mean p is extremely small and hence may be 
neglected in most practical applications. 


(a) Normal distributions with o fixed (a = 1) (b) Normal distributions with » fixed (wu = 0) 


Fig. 6-1 


Properties of the normal distribution follow: 


Theorem 6.3: 
Normal distribution N(,, 0’) 


Mean or expected value 
Variance 


Standard deviation 


That is, the mean, variance, and standard deviation of the normal distribution 
N(p, 0°) are pw, 0”, and o, respectively. This is why the symbols and o are used as 
the parameters in the definition of the above density function f(x). 


Standardized Normal Distribution 


Suppose X is any normal distribution M(w, 07). Recall that the standardized random variable 
corresponding to X is defined by 
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We note that Z is also a normal distribution and that 4 = 0 and o = 1, that is, Z is N(0,1). The 
density function for Z, obtained by setting z = (x — w)/o in the above formula for N(y, 7), follows: 


= 1 —27/2 
$(z) ae 


The graph of this function is shown in Fig. 6-2. 


Fig. 6-2. Normal distribution (0, 1). 


Figure 6-2 also tells us that the percentage of area under the standardized normal curve ¢(z) and 
hence also under any normal distribution X is as follows: 


68.2% for -l=z=1 and for M- oSxSpt+ o 
95.4% for =2= 752 and for w-2o0S5xSpt+20 
99.7% for —-3=7=3 and for w-305xS p+ 30 


This gives rise to the so-called 


68-95-99.7 rule 


This rule says that, in a normally distributed population, 68 percent (approximately) of the population 
falls within one standard deviation of the mean, 95 percent falls within two standard deviations of the 
mean, and 99.7 percent falls within three standard deviations of the mean. 


6.4 EVALUATING NORMAL PROBABILITIES 


Consider any continuous random variable X with density function f(x). Recall that the 
probability P(a = X <b) is equal to the area under the curve f between x =a and x =b. In the 
language of calculus, 


b 
Pas X sb) -| f(x) dx 
However, if X is a normal distribution, then we are able to evaluate such areas without calculus. We 


show how in this section in two steps: first with the standard normal distribution Z, and then with any 
normal distribution X. 
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Evaluating Standard Normal Probabilities 


Table 6-1 gives the area under the standard normal curve ¢ between 0 and z, where 0 = z < 4 and 
z is given in steps of 0.01. This area is denoted by ®(z), as indicated by the illustration in the 
table. 
EXAMPLE 6.5 Find: (a) (1.72), (b) ®(0.34), (c) (2.3), (d) (4.3). 


(a) To find (1.72), look down on the left for the row labeled 1.7, and then continue right for the column labeled 2. 
The entry in the table corresponding to row 1.7 and column 2 is 0.4573. Thus, ®(1.72) = 0.457 3. 


(b) To find &(0.46), look down on the left for the row labeled 0.4, and then continue right for the column labeled 6. 
The entry corresponding to row 0.4 and column 6 is 0.1772. Thus, ®(0.46) = 0.177 2. 


(c) To find (2.3), look on the left for the row labeled 2.3. The first entry 0.4893 in the row corresponds to 
2.3 = 2.30. Thus, B(2.3) = 0.489 3. 


(d) The value of ®(z) for any z = 3.9 is 0.5000. Thus, ©(4.3) = 0.5000, even though 4.3 is not in the table. 
Using Table 6-1 and the symmetry of the curve, we can find P(z,; = Z = z2), the area under the 
curve between any two values z, and Z,, as follows: 


D(z2) + B(|Zi)) if 2 = 0 5 
P(Z1 = ZS 2) = 5 P(Z2) — (zi) ifOsz=2z 
P((zi|)- P([z2|) if zz <0 


These cases are pictured in Fig. 6-3. 


@(z2) + P(|z,]) (22) — (z,) 


A(|z;|) + P(\z>I) 


EXAMPLE 6.6 Find the following probabilities for the standard normal distribution Z: 
(a) P(-0.55ZS1.1) (c) P(O.2SZ514A4) 

(b) P(—0.38 S Z $1.72) (d) P(-155Zs-0.7) 

(a) Referring to Fig. 6-3(a), 


P(-05<Z<11) = (1.1) + 0(0.5) = 0.3643 + 0.1915 = 0.5558 
(b) Referring to Fig. 6-3(a), 
P(—0.38< Z<1.72) = (1.72) + ©(0.38) = 0.4573 + 0.148 0 = 0.605 3 
(c) Referring to Fig. 6-3(b), 
P(0.2<Z<14) = (1.4) — (0.2) = 0.4192 — 0.0793 = 0.3399 


(d) Referring to Fig. 6-3(c), 
P(-15<Z<-0.7) = (1.5) — ®(0.7) = 0.4332 — 0.258 0 = 0.1752 
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Table 6-1 Standard Normal Curve Areas 
This table gives areas ®(z) under the standard 

normal distribution ¢ between 0 and z = 0 in steps 

of 0.01. 

Zz 0 1 2 3 4 5 6 7 8 9 
0.0 | 0.0000 0.0040 0.0080 0.0120 0.0160 | 0.0199 0.0239 0.0279 0.0319 0.0359 
0.1 | 0.0398 0.0438 0.0478 0.0517 0.0557 | 0.0596 0.0636 0.0675 0.0714 0.0754 
0.2 | 0.0793 0.0832 0.0871 0.0910 0.0948 | 0.0987 0.1026 0.1064 0.1103 0.1141 
0.3 | 0.1179 0.1217) 0.1255 0.1293 0.1331 | 0.1368 0.1406 0.1443 0.1480 0.1517 
0.4 | 0.1554 0.1591 0.1628 0.1664 0.1700 | 0.1736 0.1772 0.1808 0.1844 0.1879 
0.5 | 0.1915 0.1950 0.1985 0.2019 0.2054 | 0.2088 0.2123 0.2157 0.2190 0.2224 
0.6 | 0.2258 0.2291 0.2324 0.2357 0.2389 | 0.2422 0.2454 0.2486 0.2518 0.2549 
0.7 | 0.2580 0.2612 0.2642 0.2673 0.2704 | 0.2734 0.2764 0.2794 =0.2823 0.2852 
0.8 | 0.2881 0.2910 0.2939 0.2967 0.2996 | 0.3023 0.3051 0.3078 0.3106 0.3133 
0.9 | 0.3159 0.3186 0.3212 0.3238 0.3264 | 0.3289 0.3315 0.3340 0.3365 0.3389 
1.0 | 0.3413 0.3438 0.3461 0.3485 0.3508 | 0.3531 0.3554 0.3577 0.3599 ~—- 0.3621 
1.1] 0.3643 0.3665 0.3686 0.3708 0.3729 | 0.3749 0.3770 0.3790 0.3810 0.3830 
1.2 | 0.3849 0.3869 0.3888 0.3907 0.3925 | 0.3944 0.3962 0.3980 0.3997 0.4015 
1.3 | 0.4032 0.4049 0.4066 0.4082 0.4099 | 0.4115 0.4131 0.4147 0.4162 ~=0.4177 
1.4 | 0.4192 0.4207 0.4222 0.4236 0.4251 | 0.4265 0.4279 0.4292 0.4306 0.4319 
1.5 | 0.4332 0.4345 0.4357 0.4370 0.4382 | 0.4394 0.4406 0.4418 0.4429 0.4441 
1.6 | 0.4452 0.4463 0.4474 0.4484 0.4495 | 0.4505 0.4515 0.4525 0.4535 0.4545 
1.7 | 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 
1.8 | 0.4641 0.4649 0.4656 0.4664 0.4671 | 0.4678 0.4686 0.4693 0.4699 0.4706 
1.9 | 0.4713 0.4719 0.4726 0.4732 0.4738 | 0.4744 0.4750 0.4756 0.4761 0.4767 
2.0 | 0.4772 0.4778 0.4783 0.4788 ~—- 0.4793: | (0.4798 ~=—- 0.4803. «0.4808 ~=— 0.4812 —s(0.4817 
2.1 | 0.4821 0.4826 0.4830 0.4834 0.4838 | 0.4842 0.4846 0.4850 0.4854 = 0.4857 
2.2 | 0.4861 0.4864 0.4868 0.4871 0.4875 | 0.4878 0.4881 0.4884 0.4887 —0.4890 
2.3 | 0.4893 0.4896 0.4898 0.4901 0.4904 | 0.4906 0.4909 0.4911 0.4913 0.4916 
2.4 | 0.4918 0.4920 0.4922 0.4925 0.4927 | 0.4929 0.4931 0.4932 0.4934 0.4936 
2.5 | 0.4938 0.4940 0.4941 0.4943 0.4945 | 0.4946 0.4948 0.4949 0.4951 0.4952 
2.6 | 0.4953 0.4955 0.4956 0.4957 0.4959 | 0.4960 0.4961 0.4962 0.4963 0.4964 
2.7 | 0.4965 0.4966 0.4967 0.4968 0.4969 | 0.4970 0.4971 0.4972 0.4973 0.4974 
2.8 | 0.4974 0.4975 0.4976 0.4977 0.4977 | 0.4978 0.4979 0.4979 0.4980 0.4981 
2.9 | 0.4981 0.4982 0.4982 0.4983 0.4984 | 0.4984 0.4985 0.4985 0.4986 0.4986 
3.0 | 0.4987 0.4987 0.4987 0.4988 0.4988 | 0.4989 0.4989 0.4989 0.4990 0.4990 
3.1 | 0.4990 0.4991 0.4991 0.4991 0.4992 | 0.4992 0.4992 0.4992 0.4993 0.4993 
3.2 | 0.4993 0.4993 0.4994 0.4994 0.4994 | 0.4994 0.4994 0.4995 0.4995 0.4995 
3.3 | 0.4995 0.4995 0.4995 0.4996 0.4996 | 0.4996 0.4996 0.4996 0.4996 0.4997 
3.4 | 0.4997 0.4997 0.4997 0.4997 0.4997 | 0.4997 0.4997 0.4997 0.4997 0.4998 
3.5 | 0.4998 0.4998 0.4998 0.4998 0.4998 | 0.4998 0.4998 0.4998 0.4998 0.4998 
3.6 | 0.4998 0.4998 0.4999 0.4999 0.4999 | 0.4999 0.4999 0.4999 0.4999 0.4999 
3.7 | 0.4999 0.4999 0.4999 0.4999 0.4999 | 0.4999 0.4999 0.4999 0.4999 0.4999 
3.8 | 0.4999 0.4999 0.4999 0.4999 0.4999 | 0.4999 0.4999 0.4999 0.4999 0.4999 
3.9 | 0.5000 0.5000 0.5000 0.5000 0.5000 | 0.5000 0.5000 0.5000 0.5000 0.5000 
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The ‘tail end” of a one-sided probability for the standard normal distribution Z can also be 
obtained from Table 6-1 by using the fact that the total area under the normal curve is 1 and hence 
half the area is 1/2. There are two cases, the probability that Z=z, and the probability that 
Z=%. 

The probability in the first case follows: 

0.500 0 + B(z1) if0= Zz 
0.500 0 — ®(|z:]) if z,=0 
These two possibilities are pictured in Fig. 6-4(a). 

The probability in the second case follows: 


0.500 0 — ®(z;,) if0<z, 
0.5000+ ®(|z;|) if z, =0 


P(Zs5m)= | 


Pz 2)=| 


These two possibilities are pictured in Fig. 6-4(b). 


0.5 + ®(z;) 


0.5 — ®(|z,}) 


(a) P(Z=2z) 


0.5 + P(z,) 0.5 + ®(|z,|) 


(b) P(Z= 2) 


Fig. 6-4 


EXAMPLE 6.7 Find the following one-sided probabilities for the standard normal distribution Z: 


(a) P(Z=0.75) (b) P(Zs~1.2) (c) P(Z=0.60) (d) P(Z=-—0.45) 
(a) Referring to Fig. 6-4(a), 

P(Z <0.75) = 0.5 + ®(0.75) = 0.500 0 + 0.242 2 = 0.7422 
(b) Referring to Fig. 6-4(a), 

P(Z < -1.2) = 0.5 — ®(1.2) = 0.5000 — 0.3849 = 0.1151 
(c) Referring to Fig. 6-4(b), 


P(Z = 0.60) = 0.5 — ®(0.60) = 0.500 0 — 0.225 8 = 0.2742 
(d) Referring to Fig. 6-4(b), 


P(Z = —0.45) = 0.5 + &(—0.45) = 0.500 0 + 0.173 6 = 0.673 6 
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Evaluating Arbitrary Normal Probabilities 


Suppose X is a normal distribution, say X is N(w, 07). We evaluate P(a< X <b) by first 
changing a and b into corresponding standard units as follows: 


a—p 
Z1> 


and Z= 
Oo oO 


Then Pas X Sb) = Piz SZ S 2) 


This is the area under the standard normal curve between z, and z, which can be found, as above, using 
Table 6-1 on page 184. 
One-sided probabilities are obtained similarly. Namely, 


P(X<a)=P(Z<z) and = P(X=a)=P(Z=z) 


Here again a is changed into its corresponding standard unit using z = (a — p)/o. 


EXAMPLE 6.8 Suppose X is the normal distribution N(70, 4). Find: 
(a) P(685X S74) (b) P(25X 875) (c) P(63 5X $68) (d) P(X =73) 


X has mean ps = 70 and standard deviation o = V4 = 2. With reference to Figs. 6-3 and 6-4, we make the 
following computations: 


(a) ‘Transform a = 68, b = 74 into standard units as follows: 


_ 68—-p_ 68-70 _ 1 D 
o o 2 : a o 2 


Therefore [Fig. 6-3(a)] 
P(68 Ss X S74) = P(-1 SZ S82) = ®(2) + G(1) 


= 0.477 2 + 0.3413 = 0.818 4 
(b) Transform a = 72, b = 75 into standard units: 


72 —70 
= =1, 
2 2 


x1 

Accordingly [Fig. 6-3(b)]: 
P25 X S75) = PAS Z S25) = &(2.5) — &(1) 

= 0.493 8 — 0.341 3 = 0.1525 


(c) Transform a = 63, b = 68 into standard units: 


_ 63-70 _ 35 
Zt ) > £2 5) 
Therefore [Fig. 6-3(c)] 

P(63 Ss X $68) = P(-3.5 = ZS —1) = &(3.5) — ®(1) 


= 0.499 8 — 0.341 3 = 0.1585 


(d) Transform a = 73 into the standard unit z = (73 — 70)/2 = 1.5. Thus [Fig. 6.4(b)] 
P(X = 73) = P(Z = 1.5) = 0.5 — ®(1.5) 
= 0.500 0 — 0.433 2 = 0.066 8 
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EXAMPLE 6.9 Verify the above 68—95-99.7 rule, that is, for a normal random variable X, show that: 
(a) Pw-osS XS p+) = 0.68, (6b) P(w- 205 XS p+ 2o) = 0.95, 

(c) P(u-305 XS p+ 3a) = 0.997 

In each case, change to standard units and then use Table 6-1: 

(a) PQw-oSXSpto)=P(-15Z81) = 20(1) = 2(0.341 3) = 0.68 

(b) P(we-208 XS pt 20) = P(-2 5 Z S82) = 20(2) = 2(0.477 2) = 0.95 

(c) P(ue-305 XS p+ 30) = P(-3 SZ S3) = 20(3) = 2(0.498 7) = 0.997 


Remark: Let X be any continuous random variable, which includes the normal random 
variables. Then X has the property that 


P(X =a)=P(asX Sa) =0 


Accordingly, for continuous data, such as heights, weights, and temperatures (whose measurements are 
really approximations), we usually ask for the probability that X lies in some interval [a,b]. On the 
other hand, we may sometimes ask for the probability that ‘““X = a”, where we mean the probability 
that X lies in some small interval [a — ¢,a + e] centered ata. (Here the « corresponds to the accuracy 
of the measurement.) This is illustrated in the next example. 


EXAMPLE 6.10 Suppose the heights of American men are (approximately) normally distributed with mean 
w = 68 and standard deviation 0 = 2.5. Find the percentage of American men who are: 


(a) Between a = 66 and b = 71 in tall, (b) (Approximately) 6 ft tall 
(a) ‘Transform a and b into standard units obtaining 
_ 66 — 68 


Ft 6s. 
oy 


= 1.20 
2.5 


= —0.80 and Zo 


Here z}<0<z,. Hence 
P(66 Ss X S71) = P(-0.8 S Z $1.2) = (1.2) + €(0.8) 
= 0.384 9 + 0.288 1 = 0.673 0 


That is, approximately 67.3 percent of American men are between 66 and 71 in tall. 


(b) Assuming heights are rounded off to the nearest inch, we are really asking the percentage of American men 
who are between a = 71.5 and b = 72.5 inches tall. Transform a and 5 into standard units obtaining 


_ 1S = 68 | _ 72.5 - 68 _ 


La 25 1.4 and Z2>> 25 1.8 


Here 0<z,<z>. Therefore 
P7155 X $72.5) = P1145 ZS1.8) = B(1.8) — &(1.4) 
0.464 1 — 0.419 2 = 0.0449 


That is, about 4.5 percent of American men are (approximately) 6 ft tall. 


6.5 NORMAL APPROXIMATION OF THE BINOMIAL DISTRIBUTION 


n 
k 
gets larger. However, there is a way to approximate P(k) by means of a normal distribution when an 
exact computation is impractical. This is the topic in this section. 


The binomial probabilities P(k) = p* q" * become increasingly difficult to compute as n 
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Probability Histogram for B(n, p) 


The probability histograms for B(10, 0.1), B(10, 0.5), B(10, 0.7) are pictured in Fig. 6-5. (Rectangles 
whose heights are less than 0.01 have been omitted.) Generally speaking, the histogram of a binomial 
distribution B(n,p) rises as k approaches the mean w= np and falls off as k moves away from 
pw. Furthermore: 

(1) For p = 0.5, the histogram is symmetric about the mean yp as in Fig. 6-5(b). 
(2) For p <0.5, the graph is skewed to the right as in Fig. 6-5(a). 
(3) For p> 0.5, the graph is skewed to the left as in Fig. 6-5(c). 


4 4r 4r 
3 3r 3+ 
2 2 2r 
1 IF IF 
0 0 0 
Ot. 2 3 4723 6 Fi 8 9: 10 OL AeaaéAS 678 FS QOL?72Ssa@4s 6 7 &§ “3 10 
(a) B(10,0.1) (b) B(10,0.5) (c) B(10,0.7) 
Fig. 6-5 


Consider now the following distribution for B(20, 0.7) where an asterisk (*) indicates that P(k) is 
less than 0.01: 


k | 0 1 eae 8 9 10 it 12 13 #14 #=+15 0 «160~6«17)0~6«618)~=619 20 


P(k)| * * ne * 0.01 0.03 0.07 0.11 0.16 0.19 0.18 0.13 0.07 0.03 0.01 * 


The probability histogram for B(20, 0.7) appears in Fig. 6-6. 


normal distribution 


binomial distribution 


8 10 12 14 16 18 20 


Fig. 6-6. Histogram of B(20, 0.7); distribution of N(14, 4.2). 
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Although p # 0.5, observe that the histogram for B(20, 0.7) is still nearly symmetric about the 
mean p = np = 20(0.7) = 14 for values of k between 8 and 20. Also, for k outside that range, P(x) is 
practically 0. Furthermore, the standard deviation for B(20, 0.7) is (approximately) o = 2. Accord- 
ingly, the interval [8,20] is approximately [u — 30, +30]. These results are typical for binomial 
distributions B(n, p) in which both np and nq are at least 5. We state these results more formally: 


Basic Property of the Binomial Probability Histogram: For np = 5 and nq =5, the probability 


histogram for B(n, p) is nearly symmetric about = np over the interval [uw — 30, w + 30], where 
go = Vnpq, and outside this interval P(k) = 0. 


Normal Approximation, Central Limit Theorem 


The density curve for the normal distribution N(14, 4.2) is superimposed on the probability 
histogram for the binomial distribution B(20, 0.7) in Fig. 6-6. Here » = 14 and o = V4.2 for both 
distributions. The following is the fundamental relationship between any two such distributions: 


For any integer value of k between x — 30: and pw + 3a, the area under the normal curve between 
k —0.5 and k + 0.5 is approximately equal to P(k), the area of the rectangle at k. 


In other words: 


The binomial probability P(k) can be approximated by the normal probability 


P(k — 0.05 =X <k +055) 


The following fundamental central limit theorem is the theoretical justification for the above 
approximation of B(n, p) by N(np, npq). 


Central Limit Theorem 6.4: Let X,, X2, X3,... be a sequence of independent random variables with 
the same distribution and with mean yp and variance o*. Let 


_ Xn b 
olVn 


where X,, = (X, + X, +---+X,)/n. Then for large n and any interval 
{asx 3b}, 


Zn 


PiasZ,=b)=P(a =¢Sb) 
where ¢ is the standard normal distribution. 
Recall that X,, was called the sample mean of the random variables X;,..., X,,.. Thus, Z,, in the 
above theorem is the standardized sample mean. Roughly speaking, the central limit theorem says 


that in any sequence of repeated trials the standardized sample mean approaches the standard normal 
curve as the number of trials increases. 
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6.6 CALCULATIONS OF BINOMIAL PROBABILITIES USING THE NORMAL 
APPROXIMATION 


Let BP denote the binomial probability and let NP denote the normal probability. As noted 
above, for any integer value of k between p — 30 and p+ 30, we have 


BP(k) ~ NP(k-0.5 5X =k + 0.5) 
Accordingly, for nonnegative integers n, and np, 
BP(n, Sk Sm) ~ NP(ny —0.5 = X Sn, + 0.5) 


These formulas are used in the following examples. 


EXAMPLE 6.11 A fair coin is tossed 100 times. Find the probability P that heads occur exactly 60 times. 
This is a binomial experiment B(n, p) with n = 100, p = 0.5, and g=1—p=0.5. First we find 


p = np = 100(0.5) = 50, o = npq = 100(0.5)(0.5) = 25, and so a=5 
We use the normal distribution to approximate the binomial probability P(60). We have 
BP(60) = NP(59.5 = X = 60.5) 


Transforming a = 59.5 and b = 60.5 into standard units yields 


59.5 — 50 19 d 
=> ——_ = . an SUE 
£1 5 £2 5 
Here 0<z,<z). Therefore [Fig. 6-3(b)] 
P = BP(60) = NP(59.5 = X = 60.5) = NP(1.9 = Z $2.1) 


(2.1) — &(1.9) = 0.482 1 — 0.471 3 = 0.0108 


Thus, 60 heads will occur approximately 1 percent of the time. 


Remark: The above result agrees with the exact value of BP(60) to four decimal places. That is, to four 
decimal places: 


100 
BP(60) = ( pi (0.5) (0.5) = 0.0108 
However, calculating BP(60) directly is difficult. 


EXAMPLE 6.12 A fair coin is tossed 100 times (as in Example 6.11). Find the probability P that heads occur 
between 48 and 53 times inclusive. 
Again, we have the binomial experiment B(n, p) with n = 100, p = 0.5, and gq = 0.5; and again we have 
pe = np = 100(0.5) = 50 and o = Vapq = V25=5 

We seek BP(48 = k = 53) or, assuming the data are continuous, NP(47.5 = X = 53.5). Transforming a = 47.5 
and b = 53.5 into standard units yields 

— 475-50 

5 


_ 5335-50 | 


Z1 —0.5 and 22 = 5 0.7 


Here, z}<0<z,. Accordingly [Fig. 6-3(a)] 
P = BP(48 <k <53) ~ NP(47.5 < X <53.5) = NP(-0.5 < Z <0.7) 
= @(0.7) + ®(0.5) = 0.258 0 + 0.191 5 = 0.449 5 


Thus, 48 to 53 heads will occur approximately 45 percent of the time. 
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Approximation of One-Sided Binomial Probabilities 
The following formulas are used for the normal approximation to one-sided binomial probabilities: 
BP(k Sn) ~ NP(X Sn, + 0.5) and BP(k =n,) = NP(X =n, — 0.5) 


The following remark justifies these one-sided approximations. 


Remark: For the binomial distribution B(n, p), the binomial variable k lies between 0 and n. 
Thus we should actually replace: 


BP(k Sn) by BRO Sk Sn) and BP(k=n,) by BP(n, Sk Sn) 
This would yield the following approximations: 
BPO<k=<m)~NP(-05 5 X Sn, + 0.5) = NP(X Sn, + 0.5) — NP(X = —-0.5) 
BP(n, =k Sn) =~ NP(n, — 0.55 X Sn + 0.5) = NP(X¥ =n, — 0.5) — NP(X =n + 0.5) 


However, NP(X = —0.5) and NP(X = 0.5) are very small and can be neglected. This is the reason for 
the above one-sided approximations. 


EXAMPLE 6.13 A fair coin is tossed 100 times (as in Example 6.11). Find the probability P that heads occur 
less than 45 times. 
Again, we have the binomial experiment B(n, p) with n = 100, p = 0.5, and q = 0.5; and again we have 


p= np = 100(0.5) = 50 and o = Vapq = V25=5 


We seek BP(k < 45) = BP(k = 44) or, approximately, NP(X = 44.5). Transforming a = 44.5 into standard units 
yields 


z, = (44.5 — 50)/5 = -11 


Here z,<0. Accordingly [Fig. 6-4(a)] 
P = BP(k < 44) ~ NP(X < 44.5) = NP(Z <1.1) 
0.5 — ®(1.1) = 0.5 — 0.3643 = 0.135 7 


Thus, less than 45 heads will occur approximately 13.6 percent of the time. 


6.7 POISSON DISTRIBUTION 


A discrete random variable X is said to have the Poisson distribution with parameter A > 0 if X 
takes on nonnegative integer values k = 0,1,2,... with respective probabilities 
gt a 
k! 


Such a distribution will be denoted by POI(A). (This distribution is named after S. D. Poisson who 
discovered it in the early part of the nineteenth century.) 

The values of f(k;A) can be obtained by using Table 6-2 which gives values of e * for various 
values of A, or by using logarithms. 


P(k) = f(ks A) = 
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Table 6-2. Values of e~* 


0.2 0.3 0.4 0.5 0.6 0.7 0.8 


0.819 0.741 0.670 0.607 0.549 0.497 0.449 


3 4 5 6 7 8 9 10 


0.0498 0.0183 0.006 74 | 0.00248 0.000912 0.000335 0.000123 0.000045 


The Poisson distribution appears in many natural phenomena, such as the number of telephone 
calls per minute at some switchboard, the number of misprints per page in a large text, and the number 
of a particles emitted by a radioactive substance. Bar charts of the Poisson distribution for various 
values of A appear in Fig. 6-7. 


04 - 


Tha tlh, a, at. 


Fig. 6-7. Poisson distribution for selected values of A. 


Properties of the Poisson distribution follow: 


Theorem 6.5: 


Poisson distribution with parameter 


Mean or expected value 
Variance 
Standard deviation 


Although the Poisson distribution is of independent interest, it also provides us 
with a close approximation of the binomial distribution for small k provided p is small 
and A = np (Problem 6.43). This property is indicated in Table 6-3 which compares 
the binomial and Poisson distributions for small values of k with n = 100, p = 1/100, 
and A =np = 1. 
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Table 6-3 Comparison of Binomial and Poisson Distributions with n = 100, p = 1/100, and 
A=np=1 


k 0 1 2 3 4 5 


Binomial 0.366 0.370 0.185 0.061 0 0.014 9 0.002 9 
Poisson 0.368 0.368 0.184 0.061 3 0.015 3 0.003 07 


EXAMPLE 6.14 Suppose 2 percent of the items produced by a factory are defective. Find the probability P that 
there are 3 defective items in a sample of 100 items. 

The binomial distribution with n = 100 and p = 0.2 applies. However, since p is small, we can use the 
Poisson approximation with A= np = 2. Thus 


3e 


R= 9 
P = f(3,2) = J = = 8(0.135)/6 = 0.180 


On the other hand, using the binomial distribution, we would need to calculate 
P(3) = C(100, 3)(0.02)° (0.98)”” = (161,700)(0.02)° (0.98)°” = 0.182 


Thus, the difference is only about 2 percent. 


6.8 MISCELLANEOUS DISCRETE RANDOM VARIABLES 


This section discusses a number of miscellaneous discrete random variables. Recall that a 
random variable X is discrete if its range space is finite or countably infinite. 


(a) Multinomial Distribution: 


The binomial distribution is generalized as follows. Suppose the sample space S of an experiment ¢ 
is partitioned into, say, s mutually exclusive events A1, Az, ..., A; with respective probabilities p., po, 
.. Ps. (Hence p; + pp +--+ +p, = 1.) Then 


Theorem 6.6: In 7 repeated trials, the probability that A, occurs k, times, 
A, occurs k, times, ..., A, occurs k, times 


is equal to 


where kj +k, +---+k, = 


The above numbers form the so-called multinomial distribution since they are precisely the terms 
in the expansion of the expression (p; + p2+--:+p;)". Observe that when s = 2 we obtain the 
binomial distribution discussed at the beginning of the chapter. 

The process of repeated trials of the above experiment ¢ implicitly defines s discrete random 
variables X1, X2,..., Xs. Specifically, define X, to be the number of times A, occurs when ¢ is 
repeated n times. Define X, to be the number of times A, occurs when «¢ is repeated n times. And 
soon. (Observe that the random variables are not independent since the knowledge of any s — 1 of 
them gives the remaining one.) 


EXAMPLE 6.15 A fair die is tossed 8 times. Find the probability p of obtaining 5 and 6 exactly twice and the 
other numbers exactly once. 
Here we use the multinomial distribution to obtain 


a TPRIFTT (5) Ge) balla) (elle) = Sggp 7 0.06 
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(b) Hypergeometric Distribution: 


Consider a binomial experiment B(n, p), that is, the experiment is repeated n times where each time 
the probability of success is p and the probability of failure is g =1-—p. This experiment may be 
modeled as follows: 


(i) Choose a population of N elements where pN of the elements are designated as success and the 
remaining qgN elements are designated as failures. 


(ii) Choose a sample S of n elements with replacement, that is, each time a sample element is chosen, 
it is replaced in the population before the next sample element is chosen. 


Then Theorem 6.1 tells us that the probability P(k) that the sample S has k success elements is as 
follows: 


P(k) = C(n,k) p* q’* 


EXAMPLE 6.16 Consider the binomial experiment B(5, 0.6). Here p = 0.6 and gq = 0.4. One model would be 
a box with N = 10 marbles of which Np = 6 have the color white (success) and Ng = 4 have the color red 
(failure). The probability of choosing a white marble is p = 6/10 = 0.6, as required. Suppose a random sample 
S of size n = 5 is chosen, with replacement. The probability of success for each of the n = 5 choices will be 
p = 0.6, and hence we have a model of the binomial experiment B(S, 0.2). 

The probability P(3) that our sample has exactly 3 white (and hence 2 red) marbles follows: 


P(3) = C(5, 3)(0.6)3 (0.4)? = 10(0.6)° (0.4)? = 0.345 


The hypergeometric distribution applies to sampling when there is no replacement. We illustrate with an 
example. 


EXAMPLE 6.17 A class with N = 10 students has M=6men. Hence there are N— M=4women. Suppose 
a random sample of n = 5 students are selected. Find the probability p = P(3) that exactly k = 3 men (and hence 
n — k = 2 women) are selected. 

The probability follows: 


C(M,k) C(N-M,n—k) _ C(6,3) C(4,2) _ (20)(6) 


~ 0.476 
C(N,n) C(10, 5) 252 


p= P(3)= 


The denominator C(10,5) denotes the number of possible ways of selecting a sample of n = 5 from the 10 
students. The C(6,3) denotes the number of possible ways of selecting 3 men from the 6 men, and the C(4, 2) 
denotes the number of possible ways of selecting 2 women from the 4 women. 

The following theorem applies where min(M,7) means the minimum of the two numbers. 


Theorem 6.7: Suppose positive integers N, M,n are given with M,n=WN. Then the following is a 
discrete probability distribution: 

C(M,k) C(N- M,n—k) 
C(N,n) 


P(k) = for k =1,2,...,min(M, n) 

The above numbers form the so-called hypergeometric distribution; it is characterized by three 
parameters, n, N, M, and it is sometimes denoted by HYP(n, N, M). A random variable X with this 
distribution is called a hypergeometric random variable. 

If n is much smaller than M and N, then the hypergeometric distribution approaches the binomial 
distribution. Roughly speaking, with a large population, sampling with or without replacement is 
almost identical. 
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(c) Geometric Distribution: 
Consider repeated trials of a Bernoulli experiment ¢ with probability p of success and gq = 1 = p of 
failure. Let X denote the number of times ¢ must be repeated until finally obtaining a success. (For 
example, one may continually fire at a target until finally hitting the target.) Then X is a random 
variable with the following distribution: 


k [1 2 3 4 5 


3 4 


PU) | p wp dp gp 4p 


In other words, ¢ will be repeated k times only in the case that there is a sequence of k — 1 failures 
followed by a success. Thus, P(k) = q‘"' p = pq‘ ', as indicated by the above distribution table. 

First we show that the above is a probability distribution, that is, that =g*~'p =1. Recall that the 
geometric series 


l+q+q@t-:-=— 
1-q 
Hence, using p = 1 — q, we have 


k-1, _ 2 —_ Pp _Pp_ 
GPA pL Pg ge) 
» oo 


as required. 

The above distribution is called the geometric distribution; it is characterized by a single parameter 
p since g =p —1, and it is sometimes denoted by GEO(p). A random variable X with such a 
distribution is called geometric random variable. 

The following theorem applies. 


Theorem 6.8: Let X be a geometric random variable with distribution GEO(p). Then 
(i) Expectation E(X) = 1/p. 
(ii) Variance var(X) = q/p’. 
(iii) Cumulative distribution F(k) = 1— q*. 
(iv) P(k>r)=q’. 
(See Problems 6.78, 6.90, and 6.91.) 


EXAMPLE 6.18 Suppose the probability of a rocket hitting a target is p = 0.2, and a rocket is repeatedly fired 
until the target is hit. 


(a) Find the expected number E of rockets which will be fired. 
(b) Find the probability P that 4 or more rockets will be needed to finally hit the target. 
(a) By Theorem 6.8, E = 1/p = 1/(0.2) = 5. 
(b) First find g=1—0.2 =0.8. Then, by Theorem 6.8, 
P(k > 3) = q? = (0.8)? = 0.512 


That is, there is about a 50-50 chance of hitting the target with less than 4 rockets. 
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69 MISCELLANEOUS CONTINUOUS RANDOM VARIABLES 


This section discusses another two continuous random variables, one with a uniform distribution 
on an interval and the other with an exponential distribution. Recall that a random variable X is 
continuous if its range space is noncountable and if there exists a density function f(x) defined on 
R = (—”, ©) such that 


(i) f)=0, (ii) [ f(x)dx=1, (iii) ~Pla<X<b) = | f(x) dx 


That is, 


(i) fis nonnegative. 
(ii) The area under its curve is one. 
(iii) The probability that X lies in the interval [a, b] is equal to the area under f between x = a and 
x=b. 


The cumulative distribution function F(x) for the density function f(x) is defined by 


F(x) =| f(t) dt 


Frequently, a continuous random variable X is defined by simply giving its density function 
f(x). Also, if f(x) is explicitly given for only certain values of x, then we assume f(x) = 0 for the 
remaining values of x in R. 


(a) Uniform Distribution on an Interval: 


A continuous random variable X is said to have a uniform distribution on an interval I = [a, b], where 
a <b, if its density function f(x) has the constant value k on the interval and zero elsewhere. Since 
the area under f must be 1, we easily get that k(b — a) = 1 or that k = 1/(b —a). That is, 


= 
f(x) =,b-a 


0 elsewhere 


foraxx=b 


The notation UNIF(a, b) is sometimes used to denote this distribution; its graph is exhibited in Fig. 
6-8(a). 


(a) Graph of f (b) Graph of F 


Fig. 6-8 


The following theorem, proved in Problem 6.48, applies. 


www.ebook3000.com 


CHAP. 6] BINOMIAL AND NORMAL DISTRIBUTIONS 197 


Theorem 6.9: Let X be a continuous random variable with distribution UNIF(a, b). Then 


(i) Expectation E(X) = ¢ : b 
b an 2 
(ii) Variance var(X) = ow 


(iii) Cumulative distribution function: 

0 for x <a 
a 
b = 

1 for x >b 


foraxx=b 


F(x) = 


The cumulative distribution function F(x) of UNIF(a, b) is exhibited in Fig. 6-8(b). Observe that 
F(x) is 0 before the interval [a, b], increases linearly from 0 to 1 on the interval [a, b], and then remains 
at 1 after the interval [a, b]. 


(b) Exponential Distribution: 
A continuous random variable X is said to have an exponential distribution with parameter B (where 
B > 0) if its density function f(x) has the form 
1 
f(x) =o 0 x>0 


and 0 elsewhere. The notation EXP() is sometimes used to denote such a distribution. A picture 
of this distribution for various values of B appears in Fig. 6-9. 


fC) 
2.0 


1.0 


0 1.0 2.0 x 


Fig. 6-9. Exponential distribution for various values of B. 


The following theorem applies: 
Theorem 6.10: Let X be a continuous random variable with distribution EXP(8). Then 
(i) Expectation E(X) = B 
(ii) Variance var(X) = B? 
(iii) Cumulative distribution function: 


F(x)=1-e%"%, x>0 
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The exponential distribution also has the following important ‘‘no-memory property”. 
Theorem 6.11: Let X have an exponential distribution. Then 
P(X>at+t|X>a)=P(X>12) 


That is, suppose the lifetime of a certain solid-state component X is exponential. Theorem 6.11 
states that the probability that X will last ¢ units after it has already lasted a units is the same as the 
probability that X will last f units when X was new. In other words, a used component is just as 
reliable as a new component. 


EXAMPLE 6.19 Suppose the lifetime X (in days) of a certain component C is exponential with B = 120. Find 
the probability that the component C will last: 
(a) less than 60 days, (b) more than 240 days. 


The following are the distribution f(x) and cumulative distribution F(x) with B = 120: 
1 
f(x) = CT and F(x) =1- e7 2/120 


(a) The probability that C will last less than 60 days is 
P(X < 60) = F(60) = 1 —e °° = 0.393 
(b) The probability that C will last less than 240 days is 
P(X < 240) = F(240) = 1 — e? = 0.865 
Hence P(X > 240) = 1 — F(240) = 1 — 0.865 = 0.135 


These probabilities are pictured in Fig. 6-10. 


Area = 0.393 


Area = 0.135 


0 60 120 180 240 x 


Fig. 6-10. Exponential distribution with B = 120. 


EXAMPLE 6.20 Consider the component C in Example 6.19. If C is still working after 100 days, find the 
probability that it will last more than 340 days. 
By the “‘no-memory property” Theorem 6.11 of the exponential distribution: 


P(X > 340|.X > 100) = P(X > 240) = 0.135 


That is, after working 100 days, the life expectancy of the used component C is the same as a new component. 
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Solved Problems 


BINOMIAL DISTRIBUTION 
6.1. Compute P(k) for the binomial distribution B(n, p) where 
(a) n=5,p =3,k =2; (by). #= 10; pHs k= 7 (c) n=4,p=4,k=3 


n 


k 


(a) Here q = =, s0 pay = (3) (5) (5) - $4 (5)($) = Sa ~ 0329 
0) Hore g=4,50 00 = (29)(2) (2) = z0( (2) <2 = 07 
3 
4 


(c) Here g =-, so P(3) = (3)(3) (5) = 4(5)(5) = = ~ 0.047 


Use Theorem 6.1 that P(k) = ( Jo q" * where g =1-p. 


6.2. A fair coin is tossed 3 times. Find the probability that there will appear: 
(a) 3 heads, (b) exactly 2 heads, (c) exactly 1 head, (d) no heads. 


Method I: We obtain the following equiprobable space with 8 elements: 
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 
Then we simply count the number of ways the event can occur. 
(a) 3 heads HHH occurs only once; hence 
P(3 heads) = 1/8. 

(b) 2 heads occurs 3 times, HHT, HTH, THH; hence 

P(exactly 2 heads) = 3/8. 
(c) 1 head occurs 3 times, HTT, THT, TTH; hence 

P(exactly 1 head) = 3/8. 
(d) No heads, that is, 3 tails TTT, occurs only once; hence 


P(no heads) = P(3 tails) = 1/8. 


Method 2: Use Theorem 6.1 with n = 3 and p = q = 1/2. 


1\3 1 
(a) Here k =3,so P = P(3) (5) 3 0.125. 
(2) () 
2 2 
1 1 
2 2 


() 
(c) Here k = 1,s0 P= (1) = (7) ( 
(3) 


(b) Here k =2,so0 P= P(2) = 


2 
1 


1 
2, 


3 


(d) Here k =0,so P = P(0) 
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6.3. The probability that John hits a target is p = 1/4. He fires n = 6 times. Find the probability 
that he hits the target: (a) exactly 2 times, (b) more than 4 times, (c) at least once. 


This is a binomial experiment with n = 6, p = 1/4, and q = 1 — p = 3/4; hence use Theorem 6.1. 
(a) P(2)= (5) (1/4)? (3/4)* = 15(3*)/(4°) = 1215/4096 ~ 0.297. 
(b) John hits the target more than 4 times if he hits it 5 or 6 times. Hence 
P(X > 4) = P(5) + P(6) = (5) (1/4) (3/4)! + (1/4)° 


= 18/4° + 1/4° = 19/4° = 19/4096 ~ 0.004 6 
(c) Here q° = (3/4)° = 729/4096 is the probability that John misses all 6 times; hence 
P(one or more) = 1 — 729/4096 = 3367/4096 ~ 0.82 


6.4. Suppose 20 percent of the items produced by a factory are defective. Suppose 4 items are 
chosen at random. Find the probability that: 


(a) 2 are defective, (b) 3 are defective, (c) none is defective. 


This is a binomial experiment with n = 4, p = 0.2 and gq = 1— p = 0.8, that is, B(4,0.2). Hence use 
Theorem 6.1. 


(a) Here k = 2 and P(2) = a) (0.2)? (0.8)* ~ 0.153 6. 
4 
(b) Here k = 3 and P(3) = (3) (0.2)° (0.8) = 0.025 6. 


(c) Here P(0) = q* = (0.8)* = 0.4095. Hence 
P(X >0) =1-— P(0) = 1 — 0.4095 = 0.5904 


6.5. Team A has probability 2/3 of winning whenever it plays. Suppose A plays 4 games. Find the 
probability that A wins more than half of its games. 


This is a binomial experiment with n = 4, p = 2/3 and q=1-—p=1/3. A wins more than half its 
games if it wins 3 or 4 games. Hence 


P(X > 2) = P(3) + P(4) = C(4,3)(2/3)3 (1/3) + (2/3)4 
= 32/81 + 16/81 = 48/81 ~ 0.593 


6.6. A family has 6 children. Find the probability P that there are: (a) 3 boys and 3 girls, (b) fewer 
boys than girls. (Assume that the probability of any particular child being a boy is 1/2.) 


This is a binomial experiment with nm = 6 and p = q = 1/2. 
(a) P= P(3 boys) = C(6, 3)(1/2)? (1/2)* = 20/64 = 5/16. 
(b) There are fewer boys than girls if there are 0, 1 or 2 boys. Hence 
P= P(0) + P(A) + P(2) 
= (1/2)° + C(6, 1)(1/2)° (1/2) + C(6, 2)(1/2)4 (1/2) 
= 22/64 = 11/32 


Alternatively, the probability of different numbers of boys and girls is 1 — 5/16 = 11/16. Half of these 
cases will have fewer boys than girls; hence 


P = (1/2)(11/32) = 11/64 
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6.7. 


6.8. 


6.9. 


6.10. 


Find the number of dice that must be thrown so that there is a better-than-even chance of 
obtaining at least one six. 

The probability of not obtaining a six on n dice is g = (5/6)”._ Thus, we seek the smallest 
n such that q is less than 1/2. Compute: 


(5/6)! = 5/6, (5/6)? = 25/36, (5/6)3 = 125/216, (5/6)* = 6255/1296 < 1/2 


Thus, 4 dice must be thrown. 


A certain type of missile hits its target with probability p = 0.3. Find the number of missiles 
that should be fired so that there is at least a 90 percent probability of hitting the target. 


The probability of missing the target is g = 1 — p = 0.7. Hence the probability that n missiles miss 
the target is (0.7)".. Thus, we seek the smallest n for which 


1 — (0.7)” > 0.90 or equivalently (0.7)" < 0.10 
Compute 
(0.7)' = 0.7, (0.7)* = 0.49, (0.7)? = 0.343, (0.7)* = 0.240, (0.7) = 0.168, (0.7)* = 0.118, (0.7)? = 0.0823 


Thus, at least 9 missiles should be fired. 


The mathematics department has 8 graduate assistants who are assigned to the same 
office. Each assistant is just as likely to study at home as in the office. Find the minimum 
number m of desks that should be put in the office so that each assistant has a desk at least 90 
percent of the time. 


This problem can be modeled as a binomial experiment where 


n = 8 = number of assistants assigned to the office 
p =4= probability that an assistant will study in the office 
X = number of assistants studying in the office 


Suppose there are k desks in the office, where k= 8. Then a graduate student will not have a desk if 
X>k. Note that 


P(X>k) = P(k +1) + P(k +2) +++: + P(8) 


We seek the smallest value of k for which P(X > k) is less than 10 percent. 
Compute P(8), P(7), P(6), ... until the sum exceeds 10 percent. Using Theorem 6.1 with n = 8 and 
Pp =q = 1/2, we obtain 


P(8) = (1/2)8 = 1/256, P(7) = 8(1/2)’ (1/2) = 8/256, P(6) = 28(1/2)° (1/2)? = 28/256 


Now P(8) + P(7) + P(6) = 37/256 > 10 percent but P(8) + P(7)<10 percent. Thus, m = 6 desks are 
needed. 


A man fires at a target n = 6 times and hits itk =2 times. (a) List the different ways that this 
can happen. (b) How many ways are there? 


(a) List all sequences with 2 S’s (successes) and 4 F’s (failures): 


SSFFFF, SFSFFF, SFFSFF, SFFFSF, SFFFFS, FSSFFF, FSFSFF, FSFFSF, 
FSFFFS, FFSSFF, FFSFSF, FFSFFS, FFFSSF, FFFSFS, FFFFSS 


(b) There are 15 different ways as indicated by the list. 
6 
Observe that this is equal to C(6, 2) = (3) since we are distributing k = 2 letters S among the 


n = 6 positions in the sequence. 
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Prove Theorem 6.1. The probability of exactly k successes in a binomial experiment 
Bin, p) is 


P(k) = P(k successes) = (i) 2" qh 


The probability of one or more successes is 1 — q”. 

The sample space of the n repeated trials consists of all n-tuples (that is, n-element sequences) whose 
components are either S (success) or F (failure). Let A be the event of exactly k successes. Then A 
consists of all n-tuples of which k components are S and n — k components are F. The number of such 
n-tuples in the event A is equal to the number of ways that k letters S can be distributed among the n 


n : 
components of an n-tuple; hence A consists of C(n, k) = ( s sample points. 


The probability of each point in A is p* g”~*; hence 


PLA) = (7) prqr 


In particular, the probability of no successes is 
n 
P(0) = ) pg=_ 


Thus, the probability of one or more successes is 1 — q”. 


EXPECTED VALUE AND STANDARD DEVIATION 


6.12. 


6.13. 


Four fair coins are tossed. Let X denote the number of heads occurring. Calculate the 
expected value of X directly and compare with Theorem 6.2. 
X is binomially distributed with n = 4 and p= q =}. We have 


6 
PQ)=2,  PR)=S, P= = 


1 4 


16’ 


Thus, the expected value is 


id (5) | (3) | 2s) | sz) +456) i 


This agrees with Theorem 6.2, which states that 


E(X) = np = 4(5) =2 


A family has 8 children. [We assume male and female children are equally probable.] (a) 
Determine the expected number E of girls. (b) Find the probability P that the expected 
number of girls does occur. 


(a) The number of girls is binomially distributed with n = 8 and p= q=0.5. By Theorem 6.2, 
E=np = 8(0.5) =4 
(b) We seek the probability of 4 girls. By Theorem 6.1, with k = 4, 


8 
P = P(4 girls) = (3) (0.5)* (0.5)* = 0.27 = 27% 
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6.14. 


6.15. 


6.16. 


6.17. 


6.18. 


The probability that a man hits a target is p = 1/10 =0.1. He fires n = 100 times. Find the 
expected number EF of times he will hit the target and the standard deviation o. 


This is a binomial experiment B(n,p) where n = 100, p = 0.1, and q=1-—p=0.9. Thus, apply 
Theorem 6.2 to obtain 


E = np = 100(0.1) = 10, o = npq = 100(0.1)(0.9) = 9, o=V9=3 


The probability is 0.02 that an item produced by a factory is defective. A shipment of 10,000 
items is sent to a warehouse. Find the expected number E of defective items and the standard 
deviation o. 


This is a binomial experiment B(n,p) with n= 10,000, p= 0.02, and q=1-—p=0.98. By 
Theorem 6.2, 


E = np = (10,000)(0.02) = 200, o = npq = (10,000)(0.02)(0.98) = 196, o = V196 = 14 


A student takes an 18-question multiple-choice exam, with 4 choices per question. Suppose 
one of the choices is obviously incorrect, and the student makes an “educated” guess of 
the remaining choices. Find the expected number F of correct answers and the standard 
deviation o. 


This is a binomial experiment B(n,p) where n = 18, p = 1/3, and q=1-—p=2/3. Thus, apply 
Theorem 6.2 to obtain 


E = np = 18(1/3) = 6, o = npg = 18(1/3)(2/3) = 4, o=V4=2 


A fair die is tossed 300 times. Find the expected number F and the standard deviation o of 
the number of 2’s. 


The number of 2’s is binomially distributed with n = 300 and p = 1/6. Also,q=1—p=5/6. By 
Theorem 6.2, 


E = np = 300(1/6) = 50, o° = npq = 300(1/6)(5/6) = 41.67, o = V4L.67 ~ 6.45 


Prove Theorem 6.2. Let X be the binomial random variable B(n,p). Then: 
(i) w= E(X) = np, (ii) var(X’) = npg. 


On the sample space of n Bernoulli trials, let X; (for i = 1,2,...,) be the random variable which has 
the value 1 or 0 according as the ith trial is a success or a failure. Then each X; has the following 
distribution: 


P(x) |q p 


and the total number of successes is Y = X, + X,+---+ X,. 
(i) For each i, we have 
E(X;) = 0(q) + (p) =p 
Using the linearity property of E [Theorem 5.3 and Corollary 5.4], we have 
E(X) = E(X%, + X+-+++X,,) 
E(X,) + E(X) + +++ + E(X,) 


pt+pt:::+p=np 
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(ii) | For each i, we have 
E(X?) = 0(q) + %(p) = p 


and 


var(X;) = E(X?) — [E(X).) =p — p’ = pl — p) = pa 
The nm random variables X; are independent. Therefore, by Theorem 5.9, 


var(X) = var(X, + X) +++: + X,,) 


= var(X;) + var(X2) + +++ + var(X,) 


pq+pq+-::>+ pq =npq 


6.19. Give a direct proof of Theorem 6.2(i). Let _X be the binomial random variable B(n, p). Then 
w= E(X) = np. 


Using the notation P(k) = (%) p“ q"~*, we obtain the following where the last expression is obtained 
by dropping the term with k = 0, since its value is 0, and by — out np from each term: 


E(X) = > Kt) = ere SapiPtar : 


k=0 


= (n— 1)! k-1 pn-k 
» ae 2 


Let s=k—1 in the above sum. As k runs through the values | to n, s runs through the values 0 to 
n-—1. Therefore 


(n— 1) 2 
E xX a Ss I Ss = 
(x) "PD, Gia * np 
where, by the binomial theorem, 
n-1 
(n= 1)! i= -1 =4 
pea pray st 
= si\(n-1—s)! 


Thus, the theorem is proved. 


6.20. Give a direct proof of Theorem 6.2(ii). Let X be the binomial random variable B(n, p). Then 


var(X ) = npq. 
We first compute E(X*). We have 
BOO) = > ere) = > a : pe : 
(n—-1)! oo 
= k k-1 ,n-k 
“= » &-Di@-wHr 8 
Again we let s = k — 1 in the above sum and obtain 
n-1 
1)! 
E(X’) =" 2, (s+1) Teo J yi pqs 
n-1 n-1 
— (n = 1)! s pn-l-s 4 (n — 1)! s pn—-1-s 
m D5 Tq at ia tm > gat wP'4 


Using Theorem 6.2(i), the first sum in the last expression is equal to (n — 1) p; and, by the binomial 
theorem, the second sum is equal to 1. Thus 


E(X’) = np(n — 1) p + np = (np) + np( — p) = (np) + npq 
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Hence, 


var(X) = E(X*) — wx = (np)? + npg — (np) = npq 


Thus, the theorem is proved. 


NORMAL DISTRIBUTION 


6.21. 


6.22. 


6.23. 


6.24. 


The mean and standard deviation on an examination are w = 74 and o = 12, respectively. Find 
the scores in standard units of students receiving: (a) 65, (b) 74, (c) 86, (d) 92. 
x—p 65-74 _ _x—p_ 86-74 _ 


0.75 1.0 
12 ; 2 Ae esas 12 


(a) z= 


1.5 


_x—p_T4-74_ _x—p_ 92-74 _ 
oO : oO 


(b) 0, (d) 5 


The mean and standard deviation on an examination are w = 74 and o = 12, respectively. Find 
the grades corresponding to standard scores: (a) —1, (b) 0.5, (c) 1.25, (d) 1.75. 


x7 yh 


Solving z = for x yields x = oz + pw. Thus 


(a) x=o0z + p= (12)(-1) + 74 = 62, (c) x =oz + p= (12)(1.25) + 74 = 89 
(b) x=oz+ p= (12)(0.5)+74=80, (d) x=ozt+p = (12)(1.75) +74 = 95 


Table 6-1 (page 184) uses B(z) to denote the area under the standard normal curve ¢ between 
O and z. Find: (a) (1.63), (6) ®(0.75), (c) ®(1.1), (d) (4.1). 

Use Table 6-1 as follows: 
(a) To find (1.63), look down the first column on the left for the row labeled 1.6, and then continue right 


for the column labeled 3. The entry is 0.4484. That is, the entry corresponding to row 1.6 and 
column 3 is 0.4484. Hence (1.63) = 0.448 4. 


(b) To find (0.75), look down the first column on the left for the row labeled 0.7, and then continue right 
for the column labeled 5. The entry is 0.2734. That is, the entry corresponding to row 0.7 and 
column 5 is 0.2734. Hence (0.75) = 0.273 4. 


(c) To find ®(1.1), look on the left for the row labeled 1.1. The first entry in this row is 0.3643 which 
corresponds to 1.1 = 1.10. Hence ®(1.1) = 0.3643. 


(d) The value of ®(z) for any z = 3.9 is 0.5000. Thus, ©(4.1) = 0.5000 even though 4.1 is not in the 
table. 


Let Z be the random variable with standard normal distribution ¢. Find the value of z if 
(a) POS Z Sz) = 0.4429, (b) P(Z =z) = 0.7967, (c) Pz = Z <2) = 0.100 0. 
(a) Here z>0. Thus, draw a picture of z and P(0 S$ Z Sz) as in Fig. 6-11(a)._ Here Table 6-1 can be 


used directly. The entry 0.4429 appears to the right of row 1.5 and under column 8. Thus, 
z= 1.58. 


(b) Note z must be positive since the probability is greater than 0.5. Thus, draw z and P(Z Sz) as in 
Fig. 6-11(b). We have 


@(z) = P(0< Z<z) = P(Z <z) — 0.5 = 0.796 7 — 0.5000 = 0.2967 


Since 0.2967 appears in row 0.8 and column 3, we get z = 0.83. 
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(c) Since ®(2) = 0.4772 exceeds 0.1000, z must lie between 0 and 2. Thus, draw z and P(z = Z=1) 
as in Fig. 6-11(c). Then 


®(z) = ®(2) — P(z SZ S2) = 0.4772 — 0.100 0 = 0.3772 
From Table 6-1, we get z = 1.16. 


nN 


0 z 0 2 ¥ 


(a) (b) (c) 


Fig. 6-11 


6.25. Let Z be the random variable with standard normal distribution @. Find: 
(a) P(O= Z =1.35), (b) P(-1.21 = Z <0), (c) P(Z = 155). 
(a) By definition ®(z) is the area under the curve ¢ between 0 and z. Therefore, using Table 6-1, 
P(0< Z <1.35) = (1.35) = 0.4115 
(b) By symmetry and Table 6-1, 
P(-1.21 < Z <0) = P(0<Z<1.21) = (1.21) = 0.3869 
(c) The area under a single point a = 1.5 is 0; hence 


P(Z =1.5) =0 


6.26. Let Z be the random variable with standard normal distribution @. Find: 
(a) P(-1.37 = Z=2.01), (b) P(0.65 = Z <1.26), (c) P(—1.79 Ss Z S —0.54). 


Use the following formula (pictured in Fig. 6-3): 


®(z2) + B(|z;|) ifz,;=0=z 
P(z, SZ SZ) = } O(z, — B(z) if0<z.<z 
®(|z1|) — ®(/z2|) if Zz; S70 


(a) Here —1.73 <0 < 1.26, which is the first condition in the formula. Hence 


P(-1.37 S$ Z $2.01) = ®(1.26) + ®(1.37) = 0.4147 + 0.477 8 = 0.892 5 


(b) Here 0<0.65 < 1.26, which is the second condition in the formula. Hence 


P(0.65 < Z <= 1.26) = ®(1.26) — (0.65) = 0.396 2 — 0.242 2 = 0.1540 


(c) Here —1.79 < —0.54 = 0, which is the third condition in the formula. Hence 
P(-1.79 S ZS —0.54) = (1.79) — (0.54) = 0.463 3 — 0.205 4 = 0.2579 


6.27. Let Z be the random variable with standard normal distribution ¢. Find the following 
one-sided probabilities: 


(a) P(Z<=-022), (b) P(Z<033), (c) P(Z=044), (d) P(Z=—0.55) 
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6.28. 


6.29. 


Figure 6-4 shows how to compute one-sided probabilities: 
(a) P(Zs—0.22) = 0.5 — ®(0.22) = 0.5 — 0.087 1 = 0.412 9 
(b) P(Z 0.33) = 0.5 + &(0.22) = 0.5 + 0.1293 = 0.629 3 
(c) P(Z=0.44) = 0.5 — B(0.44) = 0.5 — 0.170 0 = 0.3300 
(d) P(Z=-—0.55) = 0.5 + ®(0.55) = 0.5 + 0.208 8 = 0.708 8 


Suppose that the student IQ scores form a normal distribution with mean v = 100 and standard 
deviation 0 = 20. Find the percentage P of students whose scores fall between: 


(a) 80 and 120, (c) 40 and 160, (e) over 160, 
(b) 60 and 140, (d) 100 and 120, (f) less than 80. 


All the scores are units of the standard deviation o = 20 from the mean p = 100; hence we can use 
the 68-95-99.7 rule or Fig. 6-2 instead of Table 6-1 to obtain P as follows: 


(a) P= P(80SIQ <120) = P(-1S5Z <1) = 68% 
(6) P= P(60 SIO <= 140) = P(-25Z 52) = 95% 
(c) P= P(40 <IQ < 160) = P(-3 < Z <3) = 99.7% 
(d) P= P(100 <1Q <120) = PO <Z <1) = 68%/2 = 34% 
(e) Using (c) and symmetry, we have: 
P = P(IQ = 160) = P(Z = 3) = [1 — 99.7% ]/2 = 0.3%/2 = 0.15% 
(f) P= P(IQ <80) = P(Z <—1) = 50% — 34% = 16% 


Suppose the temperature T during May is normally distributed with mean pw = 68° and standard 
deviation 0 = 6°. Find the probability p that the temperature during May is 


(a) between 70° and 80°, (b) less than 60°. 


First convert the T values into Z values in standard units, using z = (t — )/o, draw the appropriate 
picture, and then use Table 6-1. 


(a) We convert as follows: 


When ¢ = 70, we get z = (70 — 68)/6 = 0.33. 
When ft = 80, we get z = (80 — 68)/6 = 2.00. 


Since 0 < 0.33 < 2.00, draw Fig. 6-12(a). Then 


p = P(0 ST S80) = P(0.33 S$ Z S$ 2.00) 
= (2.00) — (0.33) = 0.477 2 — 0.1293 = 0.3479 


0 0.33 2 1.33 0 
(a) P(0.33) = Z <= 2.00) (b) P(Z<~-1.33) 


Fig. 6-12 
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6.30. 


6.31. 
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(b) First we convert as follows: 
When ¢ = 60, we get z = (60 — 68)/6 = —1.33. 


This is a one-sided probability with —1.33 <0; hence draw Fig. 6-12(b). Using symmetry and that 
half the area under the curve is 0.500 0, we obtain 


P = P(T <60) = P(Z < —1.33) = P(Z= 1.33) 
= 0,5 — (1.33) = 0.5000 — 0.408 2 = 0.091 8 


Suppose the weights W of 800 male students are normally distributed with mean yu = 140 lb and 
standard deviation 0 = 101b. Find the number N of students with weights: 


(a) between 138 and 148 lb, (b) more than 152 lb. 


First convert the W values into Z values in standard units, using z = (w — 2) o, draw the appropriate 
figure, and then use Table 6-1 (page 184). 


(a) We convert as follows: 
When w = 138, we get z = (138 — 140)/10 = —0.2. 
When w = 148, we get z = (148 — 140)/10 = 0.8. 
Since —0.2 <0 < 0.8, draw Fig. 6-13(a). Then 
P(138 = W = 148) = P(-0.2=Z=0.8) 
(0.8) + &(—0.2) = 0.288 1 + 0.079 3 = 0.367 4 


Thus, N = 800(0.367 4) ~ 294. 
(b) We first convert as follows: 
When w = 152, we get z = (152 — 140)/10 = 1.20 


This is a one-sided probability with 0 < 1.20; hence draw Fig. 6-13(b). Using the fact that half the 
area under the curve is 0.500 0, we obtain 


P(W = 152) = P(Z = 1.2) = 0.5 — ®(1.2) = 0.500 0 — 0.3849 = 0.1151 
Thus, N = 800(0.115 1) ~ 92. 


0.20 OS 


(a) P(-0.2=Z=0.8) (b) P(Z = —1.2) 


Fig. 6-13 


Let Z be the random variable with standard normal distribution @. Find the value of z if 
(a) P(z=Z=1)=0.4766, (b) P(zSZS1) =0.7122. 
By Table 6.1, B(1) = 0.341 3, and so 2®(1) = 0.642 3. 


(a) We have ®(1) < 0.4766 <2®(1). Therefore, z is negative and —z<1. Thus, draw z, —z, and 
P(z=Z=1)as in Fig. 6-14(a). Using symmetry, we obtain: 


®(—z) = 0.476 6 — 0.3413 = 0.1253 
By Table 6.1, —z = 0.32. Hence z = —0.32. 
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(b) We have 2@(1)<0.7122. Therefore, z is negative and —z>1. Thus, draw z, —z, and 
P(z=Z=1) as in Fig. 6-14(b). Using symmetry, we obtain 


®(—z) = 0.712 2 — 0.3413 = 0.3709 


By Table 6.1, -z = 1.13. Hence z = —1.13. 


(a) (b) 


Fig. 6-14 


6.32. Use linear interpolation in Table 6-1, which only gives values of ®(z) in steps of 0.01 for z, to 
solve the following: 


(a) Find ®(1.233) (b) Find z to the nearest thousandth, if &(z) = 0.291 7. 
(a) The linear interpolation is indicated in Fig. 6-15(a). We have 
ce 
17 (10 


Thus, (1.233) = 0.4087. 
(b) The linear interpolation is indicated in Fig. 6-15(b) where, by Table 6-1, 2917 lies between 2910 and 
2939. We have 


or x=5 and so P = 4082 + 5 = 4087 


7 
ceed or x=4 and so QO = 0.814 
10 29 
That is, z = 0.814. 
1.230 — 4082 2910 — 0.810 
3 x ei x 
10 pee P 17 29 era QO 10 
1.240 — 4099 2939 — 0.820 
(a) (b) 


Fig. 6-15 


NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION 


This section of problems uses BP to denote the binomial probability and NP to denote the normal 
probability. 


6.33. A fair coin is tossed 12 times. Determine the probability P that the number of heads occurring 
is between 4 and 7 inclusive by using: (a) the binomial distribution, (b) the normal 
approximation to the binomial distribution. 
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(a) Let heads denote a success. By Theorem 6.1, with n = 12 and p = q = 1/2, 
12\/1\*/1\8 — 495 12\/1\° /1\ 924 
BP(4) = = BP(6) = = 
@) (7) (3) (5) 4096 (6) (5 )(5) (5) 4096 
12\/1\7/1\" — 792 12\/1\7 /1\° 792 
BP(5) = = BP(7) = = 
() ac (5) 4096 7) (7 )(5) (;) 4096 


495 792 924 792 3003 


Hence P 


0.733 2. 


4096 4096 4096 4096 4096 


20 


05 


—145 0 O87 
(a) BP(4= X57) (b) NP(—145 = Z = 0.87) 
Fig. 6-16 
(b) Here 
p = np = 12(3) =6, o? = npg = 12(4)4) = 3, o = V3 =1.73 


Let X denote the number of heads occurring. We seek BP(4< X $7) which corresponds to the 
shaded area in Fig. 6-16(a). On the other hand, if we assume the data are continuous, in order to 
apply the binomial approximation, we must find NP(3.5 = X = 7.5), as indicated in Fig. 6-16(a). We 
convert x values to z values in standard units using Z = (X — y)o. Thus: 


3.5 in standard units = (3.5 — 6)/1.73 1.45 
7.5 in standard units = (7.5 — 6)/1.73 = 0.87 


Then, as indicated by Fig. 6-13(b), 


P = NP(3.5 <X <7.) = NP(-1.45 = Z < 0.87) 
= (0.87) + (1.45) = 0.3087 + 0.426 5 = 0.7343 


(Note that the relative error e = | (0.733 2 — 0.734 3)/0.733 2| = 0.001 5 is less than 0.2 percent.) 
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6.34. A fair die is tossed 180 times. Determine the probability P that the face 6 will appear: 
(a) between 29 and 32 times inclusive, (b) between 31 and 35 times inclusive, 
(c) less than 22 times. 
This is a binomial experiment B(n, p) with n = 180, p = 1/6 and g =1—p=5/6. Then 
uw. = np = 180(1/6) = 30, o = npq = 180(1/6)(5/6) = 25, o=5 
Let X denote the number of times the face 6 appears. 


(a) We seek BP(29 = X = 32) or, assuming the data are continuous, NP(28.5 = X = 32.5). Converting 
x values into standard units, we get: 


28.5 in standard units = (28.5 — 30)/5 0.3 
32.5 in standard units = (32.5 — 30)/5 = 0.5 


Thus, as shown in Fig. 6-17(a), 
P = NP(28.5 < X < 32.5) = NP(-0.3 = Z <0.5) 
(0.5) + (0.3) = 0.191 5 + 0.1179 = 0.3094 


AN An 


-03 0 O05 0.1 1.1 —1.7 0 


(a) (b) (c) 


Fig. 6-17 


(b) We seek BP(31 = X $35) or, assuming the data are continuous, NP(30.5 = X = 35.5). Converting 
x values into standard units, we get: 


30.5 in standard units = (30.5 — 30)/5 = 0.1 
35.5 in standard units = (35.5 — 30)/5 = 1.1 


Thus, as shown in Fig. 6-17(b), 
P = NP(30.5 S$ X $35.5) = NP(0.1 5 ZS 1.1) 
(1.1) — (0.1) = 0.3643 — 0.039 8 = 0.3245 


(c) We seek the one-sided probability P(X < 22) or, approximately, NP(X = 21.5). (See remark below 
and in Section 6.6 on one-sided probabilities.) We have 


21.5 in standard units = (21.5 — 30)/5 1.7 


Therefore, as shown in Fig. 6-17(c), using symmetry and that half the area under the curve is 0.500 0, 


P = NP(X <=21.5) = NP(Z < —1.7) = 0.5000 — ®(1.7) = 0.500 0 — 0.455 4 = 0.0446 


Remark: Since the binomial variable is never negative, we should actually replace BP(X < 22) by 


BP(0 < X < 22) ~ NP(-0.5 < X <21.5) = NP(-6.1 < Z < -1.7) 
= NP(Z <—1.7) — P(Z = —6.1) 


However, P(Z = —6.1) is very small and so it is neglected. 
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6.35. Among 10,000 random digits, find the probability P that the digit 3 appears: (a) between 975 
and 1025 times, (b) at most 950 times. 


This is a binomial experiment B(n, p) with n = 10,000, p = 0.1 and qg=1—p=0.9. Then 
pf = np = 10,000(0.1) = 1000, o = npg = 10,000(0.1)(0.9) = 900, a = 30 
Let X denote the number of times 3 appears. 
(a) We seek BP(975 = X = 1025) or, approximately, NP(974.5 = X = 1025.5). We have 


974.5 in standard units = (974.5 — 1000)/30 = —0.85 
1025.5 in standard units = (1025.5 — 1000)/30 = 0.85 


Therefore, 
P = NP(974.5 = X = 1025.5) = NP(—0.85 = Z = 0.85) 
= 2@(0.85) = 2(0.302 6) = 0.605 2 


(b) We seek the one-sided probability BP(X = 950) or, approximately, NP(X = 950.5). (See remark 
Section 6.6.) We have 


950.5 in standard units = (950.5 — 1000)/30 = —1.65 
Therefore 
P = NP(X = 950.5) = NP(Z S —1.65) 
= 0.500 0 — ®(1.65) = 0.500 0 — 0.450 5 = 0.049 5 


6.36. Assume that 4 percent of the population over 65 years old has Alzheimer’s disease. Suppose 
a random sample of 3500 people over 65 is taken. Find the probability P that fewer than 150 
of them have the disease. 


This is a binomial experiment B(n, p) with n = 3500, p = 0.04, and gq =1-—p=0.96. Then 
uw. = np = (3500)(0.04) = 140, o = npq = (3500)(0.04)(0.96) = 134.4, o = V1344 = 11.6 


Let X denote the number of people with Alzheimer’s disease. 
We seek BP(X < 150) or, approximately, NP(X = 149.5). We have 


149.5 in standard units = (149.5 — 140)/11.6 = 0.82 


Therefore 


P= NP(X = 149.5) = NP(Z S 0.82) = 0.500 0 + ®(0.82) = 0.500 0 + 0.293 9 = 0.793 9 


POISSON DISTRIBUTION 


6.37. Find: (a) e (b) gts. 
Use Table 6-2 (page 192) and the law of exponents. 


(a) e'3 = (e-\(e4) = (0,368)(0.741) = 0.273 
(b) e 25 = (e-?)\(e~°5) = (0.135)(0.607) = 0.081 9 
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6.38. For the Poisson distribution P(k) = f(k; A) = See find: 


(a) f(2;1) (b) f(3;5), (c) f(2; 0.7). 
Use Table 6-2 to obtain e~. 


Pe! e! 0.368 


(a) f2) = 5 5 = 0.184. 
1/72) e°S eS 0.607 

(6) 10,) =F = =~ ons. 

(c) f(2;0.7) = one” = eae = 012, 


6.39. Suppose 300 misprints are distributed randomly throughout a book of 500 pages. 
probability P that a given page contains: (a) exactly 2 misprints, (b) 2 or more misprints. 
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Find the 


We view the number of misprints on one page as the number of successes in a sequence of Bernoulli 
trials. Here n = 300 since there are 300 misprints, and p = 1/500, the probability that a misprint appears 
on a given page. Since p is small, we use the Poisson approximation to the binomial distribution with 


A = np = 0.6. 
_ (0.6)? e °° — (0.36)(0.549) 


(a) P= f(2;0.6) 51 ; = 0.098 8 ~ 0.1. 
(b) We have 
0.6 0 7,-0.6 
P(0 misprint) = #(0; 0.6) = $ is = ¢ = 0.549 
(0.6)! e796 


P(1 misprint) = f(1; 0.6) = = (0.6)(0.549) = 0.329 


1! 


Then P = 1 — P(0 or 1 misprint) = 1 — (0.549 + 0.329) = 0.122. 


6.40. Suppose 2 percent of the items produced by a factory are defective. Find the probability P that 


there are 3 defective items in a sample of 100 items. 


The binomial distribution with n = 100 and p = 0.2 applies. However, since p is small, we use the 


Poisson approximation with A = np = 2. Thus 


Be?  8(0.135) _ 
3! 6 


P = f(3;2) 0.180 


6.41. Show that the Poisson distribution f(k; A) is a probability distribution, that is, 


Dfks Ay = 1 


By known results of analysis, e* = ~~ \M/k!. Hence 


k=0 
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6.42. Prove Theorem 6.5. Let X be a random variable with the Poisson distribution f(k; A). Then 
(i) E(X) = A, (ii) var(X) =A. Hence oy = VA. 


(i) Using f(k; A) = AX e~“/k!, we obtain the following where, in the last sum, we drop the term k = 0, 
since its value is 0, and we factor out A from each term: 


2 ies Me ics NRL eA 
E(X) = 2, #6.) = Dt i aos =a 


Let s = k — 1 in the above last sum. As k runs through the values 1 to ~, s runs through the values 
Oto, Using D%o f(s; A) = 1 by the preceding Problem 6.41, we get 


Me 
! 


Ea) =a > aan Dy fla) =A 


Thus, (i) is proved. 
(ii) We compute E(X”) as follows where, again, in the last sum, we drop the term k = 0, since its value 
is 0, and we factor out A from each term: 


0 


= Na eo as ane e* 
E(X’) = >) f(A) = a eo =A > = 


k=0 


Again we let s = k — 1 and obtain: 


2 w eo = ; 
E(X2) =) a (@+1)—-=2 D+ DAN 
We break up the last sum into two sums to obtain the following where we use (i) to obtain A for the 
first sum and Problem 6.41 to obtain 1 for the second sum: 


E(X2) =a > sf(s:d) +S) f(s:A) = AA) + AML) = A? +A 


Hence var(X) = E(X?) -— py =W+A-N=A 


Thus, (ii) is proved. 


6.43. Show that if p is small and n is large, then the binomial distribution B(n, p) is approximated by 
the Poisson distribution POI(A) where A = np. That is, using 
dK e7~* 
k} 


n 
BP(k) = ( k 
we get BP(k) = f(k; A) where np = A. 
We have BP(0) = (1 — p)" = (1—A/n)". Taking the natural logarithm of both sides yields: 
In BP(0) = nIn(1 — An) 


Je" ae and = f(k; A) = 


The Taylor expansion of the natural logarithm follows: 


x x 
In(1 4 
n(l+x)=x 5 S's 
Xr 2 3 
Thus in( 1-—|= 4 4 5 A 5 
n n 2n 3n 
Therefore, when n is large, 
nN ” ne 
In BP(0) =n In| 1 A pee 
n 2n 3n 


Hence BP(0) =e *. 
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Furthermore, if p is very small, then g~1. Thus 


BP(k) (n—k+1)p_ nmp—(k-1)p_ A-(k—-l1)p_A 
BP(k —1) kq kq kq “e 


That is, BP(k) ~ ABP(k — 1)/k. Thus, using BP(0) ~ e™*, we get 
BP(1)~\e?, —- BP(2)~Ne-N2!,. — BP(3) ~ Ne7N3! 


And so on. That is, by induction, BP(k) ~ A‘ e-“/k! = f(k; A) 


MISCELLANEOUS DISTRIBUTIONS AND PROBLEMS 


6.44. The painted light bulbs produced by a factory are 50 percent red, 30 percent green, and 20 
percent blue. Ina sample of 5 bulbs, find the probability P that 2 are red, 1 is green, and 3 are 
blue. 


This is a multinomial distribution. By Theorem 6.6, 


! 
P= - 
2!1!2! 


(0.5)? (0.3)(0.2)? = 0.9 


6.45. A committee of 4 is selected at random from a class with 12 students of whom 7 are men. Find 
the probability P that the committee contains: (a) exactly 2 men, (b) at least 2 men. 


This is a hypergeometric distribution with N = 12, M=7,n=5. There are N— M=5 women. 


(a) There are C(12, 4) ways to choose the 4-person committee, C(7,2) ways to choose the 2 men, and 
C(5, 2) ways to choose the 2 women. Thus (Theorem 6.7) 


C(7,2) C(S,2) _ (21)(10) _ 
C(12,4) 495 


P= P(2)= 0.424 = 42.4% 


(b) Here P = P(2) + P(3) + P(4). Hence 


_ C(7,2) C(5, 2) + C7, 3) CS, 1) + €(7, 4) 


P 
C(12, 4) 


= 0.848 = 84.8% 


6.46. Suppose the probability that team A wins each game in a tournament is 60 percent. A plays 
until it loses. 


(a) Find the expected number E of games that A plays. 


(b) Find the probability P that A plays in at least 4 games. 


(c) Find the probability P that A wins the tournament if the tournament has 64 teams. (Thus, 
a team winning 6 times wins the tournament.) 


This is a geometric distribution with p = 0.4 and q = 0.6. (A plays until A loses.) 


(a) By Theorem 6.8, E = 1/p = 1/0.4 = 2.5. 
(b) The only way A plays at least 4 games is if A wins the first 3 games. Thus (Theorem 6.8(iv)) 


P = P(k>3) = g@ = (0.6)? = 0.216 = 21.6% 
(c) Here A must win all 6 games; hence P = (0.6)° = 0.0467 = 4.67% 
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6.47. Let X be a random variable with the following geometric distribution: 


k | 1 2 3 4 5 


4 


P(k) - ap gp 6p gp 


Prove Theorem 6.8(i): E(X) = 1/p. 


Here all sums are from 1 to ~. We have 
E(X) = > kq*"'p =r(> kg) 


1 
Let 2 ko 
e y Si 4 ia 


The derivative with respect to q yields 


dy 1 
Sc kgk' = 
dq = i (=a 


Substituting this value for > kq*~' in the formula for E yields 


oe en ee 
(l-qy p’ 


(Note that calculus is used to help evaluate the infinite series.) 


6.48. Let X be the (uniform) continuous random variable with distribution UNIF(a, b), that is, whose 
distribution function f is a constant k = 1/(b —a) on the interval 7 = [a,b] and zero else- 
where. [See Fig. 6-8.] Prove Theorem 6.9: (i) E(X) = (a+ b)/2. (ii) var(X) = (b — a)?/12, 
(iii) cumulative distribution F(x) is equal to: 

(1) 0 for x < a; (2) (x — a)/(b — a) for a=x Sb; (3) 1 for x > b. [See Fig. 6-8(5).] 


(i) If we view probability as weight or mass, and the mean as the center of gravity, then it is intuitively 
clear that wp = (a+ b)/2. We verify this mathematically using calculus: 


w= E(X)= [ vo dx = i a 


a 


dx 
a 


Ee ml be a) a a) - ; ° 


(ii) We have 


2: 


00 b 
E(X?) = 2 f(x) dx = = dx 
—-a 
=| x |- b? a _ BP +abt+a 
3(b—a)|, 3(b—a) 3(b—a) 3 


Then 


be +abt+a a+2ab+b?  (b-ay 
3 4 12 


var(X) = E(X’) — [E(X)P = 
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(iii) We have three cases: 
(1) Forx<a: 


Fay= | roar= [ oa=0 


(2) Forasxsb: 


rr fee Soe eS 


(3) Forx>b: 
Since F(x) is a cumulative distribution function, F(x) = F(b) =1. But 


F(x) = P(X Sx)S1 
Hence F(x) = 1. 


6.49. Consider the following normal distribution: 


fe) = — exp[-L/2(x — w/o] 


Show that f(x) is a continuous probability distribution, that is, show that ( f(x) dx = 1. 


Substituting ¢ = (x — p)/o in i f(x) dx, we obtain the integral 


It suffices to show that J7 = 1. We have 


1 e 2, t 2 1 “ a 2. 
P= a eat i e°?ds= on | | e PY? ds dt 
7 7 


We introduce polar coordinates in the above double integral. Let s=rcos@ and t=rsin 6. 


ds dt=rdrd0,0S0527,and0srs~, That is, 


2 iu sea (he -r/2 
r= ae re dr d@ 
T Jo 0 


But [ re’? dr = peers =1 
if 0 


1 QT 
Hence J? = mn | d@ = 1 and the theorem is proved. 
0 
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Then 
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6.50. Prove Theorem 6.3. Let X be a random variable with the normal distribution 


fl) = exp 1268 ~ 412" 
Then (i) E(X) = p and (ii) var(X) = 0°. Hence oy = o. 
1 


(i) By definition, E(X) = x exp[—1/2(x — p)*/o"] dx. Setting t = (x — p)/o, we obtain 


oV 27 


1 “ 1 ’ 5 1 i 
EX) =—— | (ot + we? dt = i te dt + p i et? dt 
V27 J_., V2 J_., V27 J_.. 


But g(t) = te-*” is an odd function, that is, g(—t) = —g(t); hence te’? dt=0. Furthermore, 


1 1a o 

—— e'? dt=1, by the preceding problem. Accordingly, E(X) = -O+p-l=p as 
V2a7 [ so V 27 

claimed. 


(ii) By definition, E(X’) = 


oO 7 


x* exp[—1/2(x — p)* o?] dx. Again setting t = (x — p)/o, we 


obtain 


1 
EAS Wo | (ot + p)2e"? dt 
Vv T oc 


1 * 1 * 1 is P 
i Pe *? dt + 240— i te? dt + p? i e'? dt 
V27 J_. V V27 J_., 


oan ee 
which reduces as above to E(X’) = &° an Pe? dt + pw 
V2T) . 


=o 


-P/2 


We integrate the above integral by parts. Let u=tand dv =te°? dt. Then v = —e°? and 


du = dt. Thus 
a i Pe fae 
V27 J_.. V2 —o 27 J_., 
Consequently, E(X) = 0? -1+ pw? = 0° + p’ and 
var(X) = E(X’) — px =e + wW-w=ac 


Thus, the theorem is proved. 


Supplementary Problems 


BINOMIAL DISTRIBUTION 


6.51. Find P(x) for the binomial distribution B(n, p) where: 
(a)n=5,p=1/3,k=2; (b)n=7,p=1/2,k=3; (c)n=4,p =1/4,k =2. 


6.52. A card is drawn and replaced 3 times from an ordinary 52-card deck. Find the probability that: 
(a) 2 hearts were drawn, (b) 3 hearts were drawn, (c) at least 1 heart was drawn. 
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6.53. 


6.54. 


6.55. 


6.56. 


6.57. 


6.58. 


6.59. 


6.60. 


A box contains 3 red marbles and 2 white marbles. A marble is drawn and replaced 3 times from the 
box. Find the probability that: 
(a) 1 red marble was drawn, (b) 2 red marbles were drawn, (c) at least 1 red marble was drawn. 


The batting average of a baseball player is 0.300. (That is, the probability that he gets a hit is 0.300.) He 
comes to bat 4 times. Find the probability that he will get: (a) exactly 2 hits, (b) at least 1 hit. 


The probability that Tom scores on a three-point basketball shot is p = 0.4. He shootsm = 5 times. Find 
the probability that he scores: (a) exactly 2 times, (b) at least once. 


Team A has probability p = 0.4 of winning each time it plays. Suppose A plays 4 games. Find the 
probability that A wins: (a) half of the games, (b) at least 1 game, (c) more than half of the games. 


An unprepared student takes a 5-question true-false quiz and guesses every answer. Find the probability 
that the student will pass the quiz if at least 4 correct answers is the passing grade. 


A certain type of missile hits its target with probability p = 1/5. (a) If 3 missiles are fired, find the 
probability that the target is hit at least once. (b) Find the number of missiles that should be fired so that 
there is at least a 90 percent probability of hitting the target (at least once). 


A card is drawn and replaced in an ordinary 52-card deck. Find the number of times a card must be 
drawn so that: (a) there is an even chance of drawing a heart, (b) the probability of drawing a heart is 
greater than 75 percent. 


A fair die is repeatedly tossed. Find the number of times the die must be tossed so that: (a) there is an 
even chance of tossing a 6, (b) the probability of tossing a 6 is greater than 80 percent. 


EXPECTED VALUE AND STANDARD DEVIATION 


6.61. 


6.62. 


6.63. 


6.64. 


6.65. 


6.66. 


Team B has probability p = 0.6 of winning each time it plays. Let X denote the number of times B wins 
in 4 games. (a) Find the distribution of X. (b) Find the mean yp, variance o°, and standard deviation 
o of X. 


Suppose 2 percent of the bolts produced by a factory are defective. In a shipment of 3600 bolts from the 
factory, find the expected number E of defective bolts and the standard deviation o. 


A fair die is tossed 180 times. Find the expected number E of times the face 6 occurs and the standard 
deviation o. 


Team A has probability p = 0.8 of winning each time it plays. Let X denote the number of times A will 
win inn = 100 games. Find the mean y, variance o”, and standard deviation o of X. 


Let X be a binomially distributed random variable B(n, p) with E(X) = 2 and var(X) = 4/3. Find n 
and p. 


Consider the binomial distribution B(n, p). Show that 


(a) PK) _(—k+t+1)p 
P(k-1) kq ; 
(b) P(k—-1)<P(k) for k<(n+1)p and P(k —1)> P(k) fork >(n +1)p. 
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NORMAL DISTRIBUTION 


6.67. 


6.68. 


6.69. 


6.70. 


6.71. 


6.72. 


6.73. 


Let Z be the standard normal random variable. Find 


(a) P(—-0.81 = Z=1.13) (c) P(0.53 S$ Z 2.03) 
(b) P(-0.235Z=1.6) (d) P(0.15 Ss Z =1.50) 


Let Z be the standard normal random variable. Find 


(a) P(Z<0.73) (c) P(Z=0.2) (ce) P(Z=18) 
(b) P(Z<18) @) Pze415) (f) P(\Z|<0.25) 


Let X be normally distributed with mean pw = 8 and standard deviation o = 2. Find the following without 
using Table 6-1, 

(a) P(6sXS10) (c) P(45 X10) (e) P(6sSX<12) 

(b) P(44sXS12) (d) P(44sX 6) (f) P(8sX=10) 


Let X be normally distributed with mean pw = 8 and standard deviation o = 4. Find: 
(a) P6sXS10) (c) PBSXS9) (e) P(X=15) 
(b) PQOsS X15) (d) PBsSXS7) (f) P(X SS) 


Suppose the weights of 2000 male students are normally distributed with mean w = 155 lb and standard 
deviation 0 = 201b. Find the number of students with weights: 


(a) not more than 100 lb, (c) between 150 and 175 Ib (inclusive), 
(b) between 120 and 130 lb (inclusive), (d) greater than or equal to 200 Ib. 


Suppose the diameter d of bolts manufactured by a company is normally distributed with mean w = 0.5 cm 
and standard deviation 0 = 0.4cm. A bolt is considered defective if d=0.45cm or d>0.55cm. Find 
the percentage of defective bolts manufactured by the company. 


Suppose the scores on an examination are normally distributed with mean w = 76 and standard deviation 
go =15. The top 15 percent of the students receive A’s and the bottom 10 percent receive F’s. Find: (a) 
the minimum score to receive an A, (b) the minimum score to pass (not to receive an F). 


NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION 


6.74. 


6.75. 


6.76. 


6.77. 


A fair coin is tossed 10 times. Find the probability of obtaining between 4 and 7 heads inclusive by 
using: 


(a) the binomial distribution, (b) the normal approximation to the binomial distribution. 


A fair coin is tossed 400 times. Find the probability that the number of heads which occurs differs from 
200 by: 


(a) more than 10, (6) more than 25 times. 


A fair die is tossed 720 times. Find the probability that the face 6 will occur: 
(a) between 100 and 125 times inclusive, (b) more than 135 times, (c) less than 110 times. 


Among 625 random digits, find the probability that the digit 7 appears: 
(a) between 50 and 60 times, (b) between 60 and 70 times, (c) more than 75 times. 
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POISSON DISTRIBUTION 


6.78. 


6.79. 


6.80. 


6.81. 


6.82. 


6.83. 


Find: (a) e7!°, (b) e7?3 
For the Poisson distribution f(k, A) = AX e~“/A!, find: 
(a) f2; 1.5), (6) FG; 1), (©) f2; 0.6). 


Suppose 220 misprints are distributed randomly throughout a book of 200 pages. Find the probability 
that a given page contains: (a) no misprints, (b) 1 misprint, (c) 2 misprints, (d) 2 or more misprints. 


Suppose 1 percent of the items made by a machine are defective. In a sample of 100 items, find the 
probability that the sample contains: (a) no defective item, (b) 1 defective item, (c) 3 or more defective 
items. 


Suppose 2 percent of the people on the average are left-handed. Find the probability of 3 or more 
left-handed among 100 people. 


Suppose there is an average of 2 suicides per year per 50,000 population. In a city of 100,000, find the 
probability that in a given year the number of suicides is: (a) 0, (b) 1, (c) 2, (d) 2 or more. 


MISCELLANEOUS DISTRIBUTIONS 


6.84. 


6.85. 


6.86. 


6.87. 


6.88. 


6.89. 


A die is loaded so that the faces occur with the following probabilities: 


ke | 1 2 3 4 4G 


P(k) | 0.1 0.15 0.15 0.15 0.15 0.3 


The die is tossed 6 times. Find the probability that: (a) each face occcurs once, (b) the faces 4, 5, 6 each 
appear twice. 


A box contains 5 red, 3 white, and 2 blue marbles. A sample of 6 marbles is drawn with replacement, that 
is, each marble is replaced before the next marble is drawn. Find the probability that: 
(a) 3 are red, 2 are white, 1 is blue; (b) 2 are red, 3 are white, 1 is blue; (c) 2 of each color appear. 


A box contains 8 red and 4 white marbles. Find the probability that a sample of size n = 4 will contain 
2 red and 2 white marbles if the sampling is done: (a) without replacement, (b) with replacement. 


Driving down a main street, the probability is 0.8 that the car meets a green light (go) instead of a red light 
(stop). (a) Find the expected number E of green lights the car meets before it must stop. (b) If the car 
“makes” the first 3 lights (they are green), find the expected number F of additional green lights the car 
meets before it must stop. 


Let X be the continuous uniform random variable UNIF(1,3). Find E(X), var(X), and cumulative 
distribution F(x). 


Suppose the life expectancy X (in hours) of a transistor tube is exponential with 6 = 180, that is, the 
following are the distribution f(x) and cumulative distribution F(x) of X: 
f(x) = (1/180) e~*"° and F(x) = 1 -— e7*/180 


Find the probability that the tube will last: (a) less than 36h, (b) between 36 and 90h, (c) more 
than 90h. 
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6.90. 


6.91. 


6.92. 


6.51. 


6.52. 


6.53. 


6.54. 


6.55. 


6.56. 


6.57. 


6.58. 


6.59. 


6.60. 


6.61. 


6.62. 


6.63. 


6.64. 


6.65. 


6.67. 


6.68. 


6.69. 


6.70. 
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Let X be the geometric random variable GEO(p). Using the relation f_, k’ q* = [q(q + I) - 4g), 
show that 


(a) E(X’) = (2-p)ip’, (b) var(X) = (1 — p)ip* 


Let X be the geometric random variable GEP(p). Prove Theorem 6.8: (iii) Cumulative distribution 
F(k) =1-q*. (iv) P(k>r)=q’. 


Show that the geometric random variable ¥ = GEO(p) has the “no memory” property, that is, 
P(k>rt+s|k>s) =P(k>r) 


Answers to Supplementary Problems 
(a) 80/243; (b) 21/128; (c) 27/128. 
(a) 9/64; (b) 1/64; (c) 37/64. 
(a) 36/1215; (b) 54/125; (c) 117/125. 
(a) 0.254 6; (b) 0.759 9. 
(a) 0.345 6; (b) 0.922. 
(a) 216/625; (b) 544/625; (c) 112/625. 
6/32 = 18.75%. 
(a) 1 — 64/125 = 61/125; (b) 11. 
(a) 3; (b) 5. 


(a) 4; (b) 9. 


(a) [0, 1, 2, 3, 4; 16/625, 96/625, 216/625, 216/625, 81/625]; (b) pw = 2.4, o = 0.96, o = 0.98. 


E=72,0=84. 
E=p=30,0=5. 

pu = 80, 0? = 16, 0 =4. 

n=6,p=1/3. 

(a) 0.661 8; (b) 0.536 2; (c) 0.276 9; (d) 0.334 5. 

(a) 0.767 3; (b) 0.964 1; (c) 0.4207; (d) 0.933 2; (e) 0; (f)0.197 4. 

(a) 68.2%; (b) 95.4%; (c) 81.8%; (d) 13.6%; (e) 81.8%; (f) 34.1%. 


(a) 0.464 9; (b) 0.268 4; (c) 0.493 1; (d) 0.295 7; (e) 0.040 1; (f) 0.226 6. 
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6.71. (a) 6; (b) 131; (c) 880; (d) 24. 

6.72. 7.3%. 

6.73. (a) 92; (b) 57. 

6.74. (a) 0.773 4; (b) 0.7718. 

6.75. (a) 0.293 8; (b) 0.0108. 

6.76. (a) 0.688 6; (b) 0.001 1. 

6.77. (a) 0.351 8; (b) 0.513 1; (c) 0.041 8. 

6.78. (a) 0.202: (b) 0.100. 

6.79. (a) 0.251; (b) 0.061 3; (c) 0.988. 

6.80. (a) 0.333; (b) 0.366; (c) 0.201; (d) 0.301. 

6.81. (Here A = 1.) (a) 0.368; (b) 0.368; (c) 0.080. 

6.82. 0.325. 

6.83. (a) 0.018 3; (b) 0.073 2: (c) 0.146 4; (d) 0.909. 

6.84. (a) 0.010 9; (b) 0.001 03. 

6.85. (a) 0.135: (b) 0.081 0; (c) 0.081 0. 

6.86. (a) [(28)(6)/495 = 0.339 = 33.9%; (b) 8/27 = 0.296 = 29.6%. 
6.87. (a) E = 1/0.2 = 5; (b) (No memory) F = 1/0.2 = 5. 

6.88. (a) (Theorem 6.9.) E(X) = 2, var(X) = 1/3, F(x) = (« — 1)/2. 
6.89. (a) 0.181; (b) 0.212: (c) 0.607. 

6.90. (a) E(X*) = =k pg! = (plq) =k gq‘ = (2 — p)p’. 

6.91. Hint: Useelt+q+q@+-:-+q*'=(1-q* VO —4q). 
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Markov Processes 


7.1 INTRODUCTION 


This chapter investigates a sequence of repeated trials of an experiment in which the outcome at 
any step in the sequence depends, at most, on the outcome of the preceding step and not on any other 
previous outcome. Such a sequence is called a Markov chain or Markov process. 


EXAMPLE 7.1 


(a) A box contains 100 light bulbs of which 8 are defective. One light bulb after another is selected from the 
box and tested to see if it is defective. This is not an example of a Markov process. The outcome of the 
third trial does depend on the preceding two trials. 


(b) Three children, A, B, C, are throwing a ball to each other. A always throws the ball to B, and B always 
throws the ball to C. However, C is just as likely to throw the ball to B as to A. This is an example 
of a Markov process. Namely, the child throwing the ball is not influenced by those who previously had 
the ball. 


Elementary properties of vectors and matrices, especially the multiplication of matrices, are 
required for this chapter. Thus, we begin with a review of vectors and matrices. The entries in our 
vectors and matrices will be real numbers, and the real numbers will also be called scalars. 


7.2 VECTORS AND MATRICES 
A vector wis a list of nm numbers, say, a), a2, ..., a,. Such a vector is denoted by 
u> [a, a2,» ++, An 


The numbers q; are called the components or entries of u. If all the a; = 0, then u is called the zero 
vector. By a scalar multiple ku of u (where k is a real number), we mean the vector obtained from 
u by multiplying each of its components by k, that is, 


ku = [ka,, kao, ..., kay] 
Two vectors are equal if and only if their corresponding components are equal. 


A matrix A is a rectangular array of numbers usually presented in the form 


41 42 Gin 
Ae 42, 22 Aon 
ant Am2 Amn 

224 
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The m horizontal lists of numbers are called the rows of A, and the n vertical lists of numbers are its 
columns. Thus, the following are the rows of the matrix A: 


[au1, A425 ++ +5 Ain], [an, A12, ++ +5 Ain], sey [an, A412, ++ +5 ay] 


Furthermore, the following are the columns of the matrix A: 


any a2 Qin 

a1 a22 Arn 
> ? > 

Ami Amn2 Amn 


Observe that the element a; of A, called the ij entry, appears in row i and column j. We frequently 
denote such a matrix by writing A = [a,j]. 

A matrix with m rows and n columns is called an m by n matrix, written m Xn. The pair of 
numbers m and n is called the size of the matrix. Two matrices A and B are equal, written A = B, 
if they have the same size and if corresponding elements are equal. Thus, the equality of two m xn 
matrices is equivalent to a system of mn equalities, one for each corresponding pair of elements. 

A matrix with only one row may be viewed as a vector and vice versa. A matrix whose entries 
are all zero is called a zero matrix and will usually be denoted by 0. 


Square Matrices 


A square matrix is a matrix with the same number of rows and columns. In particular, a square 
matrix with n rows and n columns or, in other words, an n X n matrix, is said to be of order n and is 
called an n-square matrix. 

The diagonal (or main diagonal) of an n-square matrix A = [a,] consists of the elements 


Qi, Q22,, +++) Ann 


The v-square matrix with 1’s on the diagonal and 0’s elsewhere is called the unit matrix or identity 
matrix, and will usually be denoted by I, or simply I. 


Multiplication of Matrices 


Now suppose A and B are two matrices such that the number of columns of A is equal to the 
number of rows of B, say A is an m X p matrix and B is a p X n matrix. Then the product of A and 
B, denoted by AB, is the m X n matrix C whose ij entry is obtained by multiplying the elements of row 
i of A by the corresponding elements of column j of B and then adding. That is, if A = [a;,] and 
B = [b,;], then 


a1 *'" Ap by oe by; sree Dig Ci “tt Cin 
AB=|\@1 <--> Gp F os : eer F = F Cij : =C 
Ami °° Amp by ote byj wes Dix Gnt* tt Gmn 
Pp 
where Cy = Ay Dy; + Ap ba + +++ + dip by = > Giz Dg 
k=1 


Namely, the product AB is the matrix C = [c,], where c, is defined above. 
The product AB is not defined if A is an m X p matrix and B is ag Xn matrix andp#q. That 
is, AB is not defined if the number of columns of A is not equal to the number of rows of B. 
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There are special cases of matrix multiplication which are of special interest for us. Suppose A 
is an n-square matrix. Then we can form all the powers of A, that is, 


A’ = AA, A’ = AA’, A* = AA’... 
In addition, if u is a vector with n components, then we can form the product 
uA 


which is, again, a vector with n components. We callu # 0a fixed vector or fixed point of A if wis “left 
fixed”, that is, not changed, when multiplied by A, that is, if 


uA =u 
In this case, for any scalar k # 0, one can show that 
(ku) A = k(uA) = ku 
This yields the following theorem. 


Theorem 7.1: If u is a fixed vector of a matrix A, then every nonzero scalar multiple ku of u is also 
a fixed vector of A. 


EXAMPLE 7.2 
1 2 
(a) Let A=| | Then 
3 4 
4 1 2)/1 2 1+ 6 2+ 8 7 10 
A= = = 
b ale ;| re | las | 
2 1 
(b) Let u= [2,-1] and A= | al Then 
2 1 
wA = [2,—1]| > 3 [4-2,2-3] =[2,-1]=u 


Thus, u is a fixed vector of A. Then, as expected from the above theorem, 2u = [4, —2] is also a fixed vector 
of A, namely, 

2 1 
2 3 


(Qu) A = [4,~2]] [8 — 4,4 — 6] = [4, —2] = 2u 


7.3 PROBABILITY VECTORS AND STOCHASTIC MATRICES 


A vector q = [q1, 92, --- Gn] is called a probability vector if its entries are nonnegative and their 

sum is 1, that is, if: 
(i) Eachq;=0, Gi) Git gate t+ qn=1. 

Recall that the probability distribution of a sample space S with n points has these two propereties and 
hence forms a probability vector. 

A square matrix P = [p;] is called a stochastic matrix if each row of P is a probability 
vector. Thus, a probability vector may also be viewed as a stochastic matrix. 

The following theorem (proved in Problem 7.8) applies. (The proof uses the fact that if u is a 
probability vector, then uA is also a probability vector.) 


Theorem 7.2: Suppose A and B are stochastic matrices. Then the product AB is also a stochastic 
matrix. Thus, in particular, all powers A” are stochastic matrices. 


We now define an important class of stochastic matrices. 


Definition: A stochastic matrix P is said to be regular if all the entries of some power P” of P are 
positive. 
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EXAMPLE 7.3 


(a) The nonzero vector u = [3,1,0,5] is not a probability vector since the sum of its entries is 9, not 
1. However, since the components of u are nonnegative, there is a unique probability vector q, which is a 
scalar multiple of u. This probability vector q, can be obtained by multiplying u by the reciprocal of the 
sum of its components. That is, the following is the unique probability vector which is a multiple of u: 


1 
Wy = GV = [3/9, 1/9, 0, 5/9] 


(b) Consider the following two matrices: 


0 1 1 0 
A= and B= 
Es ipl 1/2 | 


Both of them are stochastic matrices. In particular, A is regular since, as follows, all entries in A? are 


positive: 
v= walls l= | ve ail 
72 1/2), 1/2 1/2 1/4 3/4 


On the other hand, one can show that B is not regular. Specifically 


eect. (0 pede ae eo 1 
3/4 1/4 718 118 15/16 1/16 


and every power B” of B will have 1 and 0 in the first row. Accordingly, B is not regular. 


The fundamental property of regular stochastic matrices is contained in the following theorem 
whose proof lies beyond the scope of this text. 
Theorem 7.3: Let P be a regular stochastic matrix. Then: 


(i) P has a unique fixed probability vector t, and the components of t are all 
positive. 


(ii) The sequence P, P’, P*, ... of powers of P approaches the matrix T whose rows 
are each the fixed point t. 


(iii) If q is any probability vector, then the sequence of vectors 
q, qP, qP’, qP°,... 
approaches the fixed point t. 


Note that P” approaches T means that each entry of P” approaches the corresponding entry 
of T, and qP” approaches t means that each component of qP” approaches the corresponding 
component of t. 


EXAMPLE 7.4 Consider the following stochastic matrix P [which is regular since P° has only positive 
entries]: 


0 1 0O 
P=] 0 0 1 
V2 1/2 O 


Find its unique fixed probability vector t for P. 
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Method 1: We seek the probability vector t with three components such that tP = t. The vector t can be 
represented in the form [x, y,1 —x—y]. Accordingly, we form the following matrix equation: 


0 1 0 
Ix.y,l-x-y]} 0 O L}=[x,y,1-x-y] 
12 1/2 O 


Multiply the left side of the matrix equation, and then set corresponding components equal to each other. This 
yields the following system of linear equations: 


[-be-b=x aye y=t os 
x+5-M-W=y OF x-3y=-1 or i 

— 9 
oe aa aa x+2y=1 


Thus, t = [1/5, 2/5, 2/5]. 


Method 2: We first seek any fixed vector u = [x, y, z] of the matrix P. Thus, we form the matrix equation: 


0 1 0 sz =X 
[ey%2]] 0 O T)=—yz] of xty= 
1/2 1/2 0 yHz 


We know that the system has a nonzero solution; hence we can arbitrarily assign a value to one of the 
unknowns. Set z= 2. Then by the first equation x = 1 and by the third equation y = 2. Thus, u = [1, 2, 2] is 
a fixed point of P. But every multiple of u is also a fixed point of P. Accordingly, multiply u by 1/5 to obtain 
the following unique fixed probability vector of P: 


t = fu = [1/5, 2/5, 2/5] 


7.4 TRANSITION MATRIX OF A MARKOV PROCESS 


A Markov process or chain consists of a sequence of repeated trials of an experiment whose 
outcomes have the following two properties: 


(i) Each outcome belongs to a finite set {a,, az, ..., a,} called the state space of the system; if the 
outcome on the nth trial is a;, then we say the system is in state a; at time n or at the nth step. 

(ii) The outcome of any trial depends, at most, on the outcome of the preceding trial and not on any 
other previous outcome. 


Accordingly, with each pair of states (a;, a;), there is given the probability p,; that a; occurs immediately 
after a; occurs. The probabilities p, form the following n-square matrix: 


Pu Pi2 °°" Pin 
M= Pu P22 P2n 
Pm Pn2 ae Pnn 


This matrix M is called the transition matrix of the Markov process. 

Observe that with each state a; there corresponds the ith row [pj, Pj2,---; Pin] of the transition 
matrix M. Moreover, if the system is in state a;, then this row represents the probabilities of all the 
possible outcomes of the next trial and so it is a probability vector. We state this result formally. 


Theorem 7.4: The transition matrix M of a Markov process is a stochastic matrix. 
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EXAMPLE 7.5 


(a) 


(b) 


(c) 


A man either takes a bus or drives his car to work each day. Suppose he never takes the bus 2 days in a 
row; but if he drives to work, then the next day he is just as likely to drive again as he is to take the bus. 

This stochastic process is a Markov chain since the outcome on any day depends only on what happened 
the preceding day. The state space is {b(bus), d(drive)} and the transition matrix M follows: 


bod 


m=? 0 1 
d{ 1/2 1/2 


The first row of the matrix M corresponds to the fact that the man never takes the bus 2 days in a row, and 
so he definitely will drive the day after he takes the bus. The second row of M corresponds to the fact that 
the day after he drives he will drive or take the bus with equal probability. 

Three children, Ann (A), Bill (B), and Casey (C), are throwing a ball to each other. Ann always throws 


the ball to Bill, and Bill always throws the ball to Casey. However, Casey is just as likely to throw the ball 
to Bill as to Ann. The ball throwing is a Markov process with the following transition matrix: 


A B C 

Al 0 1 O 
M=B| 0 0 1 
Ci 1/2 1/2 O 


The first row of the matrix corresponds to the fact that Ann always throws the ball to Bill. The second row 
of the matrix corresponds to the fact that Bill always throws the ball to Casey. The last row of the matrix 
corresponds to the fact that Casey always throws the ball to Ann or Bill with equal probability (and does 
not throw the ball to himself). 

[Observe that this is the Markov process given in Example 7.1(b).] 


An elementary school contains 200 boys and 150 girls. One student is selected after another to take an eye 
examination. Let X,, denote the sex of the nth student who takes the examination; hence the following is 
the state space of the stochastic process: 


S = {m(male), f(female)} 


This process is not a Markov process. For example, the probability that the third student is a girl depends 
not only on the outcome of the first trial but on the outcomes of both the first and second trials. 


7.5 STATE DISTRIBUTIONS 


Consider a Markov process with transition matrix M. The kth state distribution of the Markov 


process is the following probability vector: 


qx = [dua k2> sey dun| 


where q,; is the probability that the state a; occurs at the kth trial of the Markov chain. 


Suppose the initial state distribution go (at time t= 0) is given. Then the subsequent state 


distributions can be obtained by multiplying the preceding state distribution by the transition matrix 


M. 


Namely, 
qgoM= a, aM = q, q2M = ga,... 


Accordingly 


G2 = 41 M = (qo M) M = qo M” 93 = 42M = (qo M’) M = qo M* 


and so on. We state this result formally. 


Theorem 7.5: Suppose an initial state distribution gq, is given. Then, for k =1,2,..., 


dk = Uk-1M = qo M‘ 
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EXAMPLE 7.6 Consider the Markov chain in Example 7.5(b) with transition matrix M. Suppose Casey is the 
first person with the ball, that is, suppose qo = [0, 0, 1] is the initial probability distribution. Then 


0 1 0 
d= qoM=[0,0,1]} 0 0 1] =[1/2,1/2,0] 
1/22 1/2 0 
0 1 0 
qo = 1M = [1/2,1/2,0]] 0 0 1] =[0,1/2,1/2] 
12 12 0 
0 1 0 
da = 2M = [0,1/2,1/2]] 0 0 1] =[1/4,1/4,12] 
172 12 0 


Thus, after 3 throws, the probability that Ann has the ball is 1/4, that Bill has the ball is 1/4, and that Casey has 
the ball is 1/2. 


7.6 REGULAR MARKOV PROCESSES AND STATIONARY STATE DISTRIBUTIONS 


A Markov chain is said to be regular if its transition matrix M is regular. Recall Theorem 7.3: if 
M is regular then M has a unique fixed probability vector t and, for any probability vector q, the 
sequence 


q, aM, qM’, qM’,... 
approaches the unique fixed point t. Thus, Theorems 7.3 and 7.5 give us the next basic result. 
Theorem 7.6: Suppose the transition matrix M of a Markov chain is regular. Then, in the long run, 


the probability that any state a; occurs is approximately equal to the component ¢; of 
the unique fixed probability vector t of M. 


Thus, we see that the effect of the initial state distribution in a regular Markov process wears off 
as the number of steps increases. That is, every sequence of state distributions approaches the fixed 
probability vector t of M, which is called the stationary distribution of the Markov chain. 


EXAMPLE 7.7 


(a) Consider the Markov process in Example 7.5(b) where Ann, Bill, and Casey throw a ball to each other with 
the following transition matrix: 


0 1 0O 
M=] 0 0 1 
V2 1/2 +O 


By Example 7.4, t = [1/5, 2/5, 2/5] is the unique fixed probability vector of M. Thus, in the long run, Ann 
will be thrown the ball 20 percent of the time, and Bill and Casey will be thrown the ball 40 percent of 
the time. 


(b) Consider the Markov process in Example 7.5(a) where a man takes a bus or drives to work with the following 


transition matrix: 
0 1 
M=- 
1/2 1/2 


Find the stationary distribution of the Markov process. 
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We seek a probability vector t = [x, 1 — x] such that tM =t. Thus, set 


0 1 


1 == 
Ce Lin 12 


]=b1-s 


Multiply the left side of the matrix equation to obtain 
5—5X =X 


11,141 2 
5 9X9 + 5x x lx or 
[5 — 2x52 + 2x] = [ ] {j LS os 


= 1 
or x =3 


Thus, t = [1/3, 1 — 1/3] = [1/3, 2/3] is the unique fixed probability vector of M. Therefore, in the long run, 
the man will take the bus to work 1/3 of the time, and drive to work the other 2/3 of the time. 


Solved Problems 
MATRIX MULTIPLICATION 


| ae 
71. Letu=([1,-2,4JandA=|/0 2 S|. FinduA. 
41 6 


The product of the three-component vector u by the 3 x 3 matrix A is again a three-component 
vector. To obtain the first component of uA, multiply the elements of u by the corresponding elements 
of the first column of A and then add as follows: 


LL. 3) Ht 
[1,-2,4]]0 2 5} = [1@) — 2(0) + 4(4), ; ] = [17, ; ] 
4 1 6 


To obtain the second component of uA, multiply the elements of u by the corresponding elements of the 
second column of A and then add as follows: 


i 23 251, 
[1,-2,4]]0 2 5|/=[17, 123)-22)+40), ]=[17.3,  ] 
rie Oia 


To obtain the third component of uA, multiply the elements of u by the corresponding elements of the 
third column of A and then add as follows: 


1. 3 =1 
[1,-2,4]]0 2 5]/=[17, 3, 1(-1) —2(5) + 4(6)] = [17,3, 13] 
4 1 6 
Namely, uA = [17, 3, 13] 


Z. ch 5 —2 6 
Since A is 2 X 2 and B is 2 X 3, the product AB is defined and AB is a 2 X 3 matrix. To obtain the 


first row of the product AB, multiply the first row (1,3) of A by the corresponding elements of each of 


the columns | ; |_| ; ei of B and then add 


7.2. Find AB where A =|; i weg, aude 2 0 ai 


2+15 0-6 4+ 18 17 -6 14 
eal a tae 


To obtain the second row of the product AB, multiply the second row (2, —1) of A by the corresponding 
elements of each of the columns of B, and then add 
17 —6 14 _ [17 -6 14 

L 2 =34 
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Remark: B is a 2 X 3 matrix and A is a 2 X 2 matrix, so the number 3 of columns of B is not equal 
to the number 2 of rows of A; hence the product BA is not defined. 


1 2 4 6 ; 
7.3. Leta=|) | and B =| ef Find AB and BA. 
We have 

4+0 6- 4 4 -2 

AB = = 
Le if ap B | 
4+18 8+30 22 38 

and BA = = 
ee 6 seat | 6 aa 


Remark: Although the products AB and BA are defined, they are not equal. In other words, 
matrix multiplication does not satisfy the commutative law that AB = BA. 


2 4 


@ w=aa=[) J |[5 | 


_ Bx +3(2) 1(3) 4 7 7 | 
2(1) + 4(2) 2(3) + 4(4) 10 22 


wm w=ane=ls allio 22 


_ Be +3(10) 1(15) + Hel = [2 | 
2(7) + 4(10) 215) + 4(22) 54 118 


1 3 
7A. Leta =| | Find: (a) A2, (b) A’. 


PROBABILITY VECTORS AND STOCHASTIC MATRICES 


7.5. Find a multiple of each vector v which is a probability vector q,: 
(a) v= [2, 1, 2, 0, 3], (c) v = [2/3, 1, 3/5, 5/6], 
(b) v = [1/2, 2/3, 2, 5/6], (d) v = [0, 0, 0, 0] 


(a) The sum of the components of v is 8; hence multiply v by 1/8, that is, multiply each component of 
v by 1/8 to obtain the probability vector 


qv = [1/4, 1/8, 1/4, 0, 3/8] 


(b) First multiply the vector v by 6 to eliminate fractions. This yields the vector v’ = [3,4,12,5]. The 
sum of the components of v’ is 24. Then multiply each component of v’ by 1/24 to obtain the 
probability vector 


qv = [1/8, 1/6, 1/2, 5/24] 


However, q, is also a multiple of v. 


(c) First multiply the vector v by 30 to obtain v’ = [20, 30, 18,25]. The sum of the components of vy’ is 
93. Then multiply each component of v’ by 1/93 to obtain 


qy = [20/93, 30/93, 18/93, 25/93] 


(d) Every scalar multiple of the zero vector is the zero vector whose components add up to 0. Thus, no 
multiple of the zero vector is a probability vector. 
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7.6. Determine which of the following are stochastic matrices: 


1/3 13 1/3 3/4 1/4 3/2 —1/2 3/4 1/4 
ad be 0 ae we Be aah ia Fe a am Ee 7 
A is not a stochastic matrix since it is not a square matrix. 
B is not a stochastic matrix since the sum of the entries in the second row exceeds 1. 


C is not a stochastic matrix since an entry is negative. 
D is a stochastic matrix. 


7.7. Suppose A = [a] is an n-square stochastic matrix, and u = [u;,U,...,U,] is a probability 
vector. Prove that uA is also a probability vector. 


By matrix multiplication: 


Q1 2 Qn 

4x1 an? Q2n 
uA = [i4, Ud, > Un| 

ant Qn2 Bae, Ann 


= [=; Uj Ajj, ij Uj Aj,» . «Dj Uj Ain] 


Since the u; and the a, are nonnegative, the components of uA are also nonnegative. Thus, it only 
remains to show that the sum S of the components of uA is equal to one. Using the fact that the sum 
of the entries in any row of A is equal to one and that the sum of the components of u is equal to one, 
that is, using a, = 1, for any i, and 2; u; = 1, we get 
S =; u; a, + 2, uj; a + +++ + 2D; Uj Ain 
= Uy Dj yj + Un Dj Aaj +++ + Uy Dj Any 
u,(1) + u(1) + +++ + u,(1) 


Thus, uA is a probability vector. 


7.8. Prove Theorem 7.2. Suppose A and B are stochastic matrices. Then the product AB is also 
a stochastic matrix. Thus, in particular, all powers A” are stochastic matrices. 


Let s; denote the ith row of the product matrix AB. Then g; is obtained by multiplying the ith row 
r, of A by the matrix B, that is, 


s;=1,B 


Since r; is a probability vector and B is a stochastic matrix, the product s; is also a probability vector by 
the preceding Problem 7.7. Thus, AB is a stochastic matrix since each row is a probability vector. 


7.9. Let p = [p1, p2,.--, Pn] be a probability vector, and let T be a matrix whose rows are all the same 
vector t = [t1,6,...,¢,]. Prove that pT =t. 
Here we use the fact that p, + po +--+: +p, =2p;=1. We have 
hob tn 
ee) th 


BE Pee PA ais ad canoes 
= [> pit, = pib,... = pith] 
_ [4.2 pint pi,-- th = pil 


oS [t1, h, 48: ty] =t 
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REGULAR STOCHASTIC MATRICES AND FIXED PROBABILITY VECTORS 


7.10. 


7.11. 


7.12. 


7.13. 


3/4 1/4 
Find the unique fixed probability vector t of the regular stochastic matrix A = 21 a 
Which matrix does A” approach as n becomes larger? 


We seek a probability vector t = [x, 1 — x] such that tA =t. Thus, set 


3/4 1/4 


1- 
bs Alin 12 


= [x,1 — x] 
Multiply the left side of the above matrix equation and then set corresponding components equal to each 
other to obtain the following two equations: 
sy +h—-hy=x and iy th—-b=1-x 
Solve either equation to obtain x = 2/3. Thus, t = [2/3, 1/3]. 


The matrix A” approaches the matrix T whose rows are each the fixed point t; hence A” 
approaches 


: be ‘8 
2/3 1/3 


l-a a 

b 1—b 
u = [b,a] is a fixed point M. (Note that the fixed point u of M consists of the nondiagonal 
elements of M.) 


Consider the general 2 X 2 stochastic matrix M = | Prove that the vector 


Matrix multiplication yields 


1l-a a 
b 1-—b 


uM = [6,4] [b — ab + ab,ab + a — ab] =[b,a] =u 


Thus, u is a fixed point of M. 


Use Problem 7.11 to find the unique fixed probability vector t of each stochastic matrix: 


13 223 2 1/2 0.7 03 
@) a=|i in ®) Bal ial © el, a 


(a) By Problem 7.11, u = [1, 2/3] is a fixed point of A. Multiply u by 3 to obtain the fixed point [3, 2] 
of A which has no fractions. Since the sum of the components of [3, 2] is 5, multiply [3,2] by 1/5 
to obtain the required probability vector t = [3/5, 2/5]. 

(b) By Problem 7.11, u = [2/3, 1/2] is a fixed point of B. Multiply u by 6 to obtain the fixed point [4, 3] 
of B which has no fractions. Since the sum of the components of [4, 3] is 7, multiply [4/3] by 1/7 to 
obtain the required probability vector t = [4/7, 3/7]. 

(c) By Problem 7.11, u = [0.8,0.3] is a fixed point of C. Hence [8,3] and the probability vector 
t = [8/11, 3/11] are also fixed points of C. 


Find the unique fixed probability vector t of the following regular stochastic matrix: 


0 1 O 
P=] 1/6 1/2 1/3 
0 2/3 1/3 


Which matrix does P” approach as n becomes larger? 
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7.14. 


7.15. 
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We first seek any fixed vector u = [x,y,z] of P. Thus, set 
0 1 0 
[x,y,z]] 1/6 1/2 1/3] = [x, y, z] 
0 2/3 1/3 


Multiply the left side of the above matrix equation and then set corresponding components equal to each 
other to obtain the following system of three equations: 


=x y= 6x y= 6x 
x+5yt3z=y or 6x + 3y + 4z = by or 6x + 4z = 3y 
ytae=Zz y+z=3z y=2z 


We know the system has a nonzero solution; hence we can arbitrarily assign a nonzero value to one of the 
unknowns. Setx=1. By the first equation y = 6 and by the last equation z = 3. Thus, u = [1, 6, 3] is 
a fixed point of P. Since 1 + 6 +3 = 10, the vector 


t = [1/10, 6/10, 3/10] 


is the required unique fixed probability vector of P. 
The matrix P” approaches the matrix T whose rows are each the fixed point t; hence P” 
approaches 
1/10 6/10 3/10 
1/10 6/10 3/10 
1/10 6/10 3/10 


T- 


Suppose P is a stochastic matrix. 


(a) 
(b) 


[Assume P is not a 1 X 1 matrix.] 
Suppose t = [1/4, 0, 1/2, 1/4, 0] is a fixed point of P. Explain why P is not regular. 


Suppose P has 1 on the diagonal. Show that P is not regular. 


(a) Theorem 7.3 tells us that if P is regular, then P has a unique probability vector whose components 
are all positive. Since t has zero components, P is not regular. 
(b) Let e, be the vector with 1 in the kth position and 0’s elsewhere, that is, e, has the following 


form: 
e, = [0,...,0,1,0,...,0] 


Suppose the kth diagonal entry of P is 1. Since P is a stochastic matrix, e, must be the kth row of 
P. By matrix multiplication, e, will be the kth row of all powers of P. Thus, P is not regular. 


Determine which of the following stochastic matrices are regular: 


aa 1/2 1/4 1/4 OG? A 
(a) a=|' | (6) B=|0 1 0 (c) C=|12 14 1/4 
1/22 1/22 0 0. - <O 


Recall that a stochastic matrix is regular if a power of the matrix has only positive entries. 


(a) We have 


s 0 1)/0 1 1 0 : . 
At= = = the unit matrix I 
1 Oj}L1 O 0 1 


4 1 O}7O 1 0 1 
A= = =A 
0 1},1 0 1 0 
Thus, every even power of A is the unit matrix I, and every odd power of A is the matrix A. 
every power of A has zero entries, and so A is not regular. 


Thus, 
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(b) The matrix B is not regular since it has a 1 on its diagonal. 
(c) Computing C? and C? yields 


0 1 0 1/2 «1/4 1/4 
C’=/1/8 5/16 9/16 C= | 5/32 41/64 13/64 
1/2 1/4 1/4 V8 S/16 = 9/16 


Since all entries in C* are positive, C is regular. 
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7.16. 


7.17. 


Bob’s study habits are as follows. If he studies one night, he is 70 percent sure not to study the 
next night. On the other hand, if he does not study one night, he is only 60 percent sure not 
to study the next night as well. Find out how often, in the long run, Bob studies. 


This is a Markov process where the states of the system are S (studying) and T (not studying). The 
transition matrix M of the process is as follows: 


S oT 
eae 207 
~ T104 0.6 


To discover what happens in the long run, we must find the unique fixed probability vector tof M. By 
Problem 7.11, u = [0.4, 0.7] is a fixed point of M and so t = [4/11, 7/11] is the required fixed probability 
vector. Thus, in the long run, Bob studies 4/11 of the time. 


A psychologist makes the following assumptions concerning the behavior of mice subjected to 
a particular feeding schedule. For any particular trial, 80 percent of the mice that went right 
on a previous experiment will go right on this trial, and 60 percent of those mice that went left 
on the previous experiment will go right on this trial. Suppose 50 percent of the mice went 
right on the first trial. 


(a) Find the prediction of the psychologist for the next two trials. 
(b) When will the process stabilize? 


The states of the system are R (right) and L (left), and the transition matrix M of the process is as 
follows: 


R OL 


ma Rfo8 02 
“L106 04 


The probability distribution for the first (initial) trial is q = [0.5, 0.5]. 


(a) To predict the probability distribution for the next step (second trial), multiply q by M. This 
yields 


0.8 0.2 


0.5, 0.5 
Ds; he 0.4 


= [0.7, 0.3] 

Thus, the psychologist predicts 70 percent of the mice will go right and 30 percent will go left on the 
second trial. To predict the probability distribution for the next step (third trial), multiply the 
previous distribution by M. This yields 


0.8 0.2 


0.7, 0.3 
Ds ee 0.4 


= [0.74, 0.26] 


Thus, the psychologist predicts 74 percent of the mice will go right and 26 percent will go left on the 
third trial. 
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7.18. 


7.19. 


(b) The process will stabilize when it reaches its fixed probability distribution t. By Problem 7.11, 
u = [0.6, 0.2] is a fixed point of M and so t = [3/4, 1/4] = [0.75, 0.25]. The fourth trial, rounded to 
two decimal places, gives the state distribution [0.75, 0.25]. Thus, the process stabilizes after the 
third trial. 


Consider a Markov process with initial probability distribution qo = [1/2,0,1/2] and the 
following transition matrix: 


0 1/2 1/2 
M=/1/2 1/2 0O 
0 1 0 


(a) Find the following three probability distributions qi, g2, and qs. 


(b) Find the matrix that M” approaches as n gets larger. 


(a) Multiply qo by M to obtain q;: 


0 1/2 1/2 
QM = qoM = [1/2,0,1/2]} 1/2 1/2 0 |} = [0,3/4, 1/4] 
0 1 O 
Multiply g,; by M to obtain q,: 
0 1/2 1/2 
q2 = a M = (0, 3/4, 1/4]] 1/2 1/2 0 | = [3/8, 5/8, 0] 
0 1 O 
Multiply g. by M to obtain q;3: 
0 12 1/7 
43 = q2M = [3/8, 5/8, 0]] 1/2 1/2 0 | = [5/16, 1/2, 3/16] 
0 1 O 


(b) MQ” approaches the matrix T whose rows are each the unique fixed probability vector tof M. To find 
t, first find any fixed vector u = [x,y,z] of M. Thus 


0 1/2 1/2 sy=x 
[x,y,z] V2 1/2 0 = [x, y, z] or wthtz=y 
0 1 0 yey 


Find any nonzero solution of the above system of linear equations. Set z=1. By the third 
equation x = 2, and by the first equation y= 4. Thus, u = [2,4,1] is a fixed point of M and so 
t = [2/7, 4/7, 1/7]. Accordingly, M,, approaches the following matrix: 


2/7 4/7 1/7 
T=)2/7 4/7 1/7 
2/7 4/7 1/7 


A salesman S sells in only three cities, A, B, and C. Suppose S never sells in the same city on 
successive days. If S sells in city A, then the next day S sells in city B. However, if S' sells in 
either B or C, then the next day S is twice as likely to sell in city A as in the other city. Find 
out how often, in the long run, S sells in each city. 
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The transition matrix of the Markov process follows: 


A B C 

Al 0 1 0 
M=8/2/33 0 1/3 
C(2/3 13 «0 


The first row [0, 1,0] comes from the fact that if S sells in city A, then S will always sell in city B the next 
day. The 2/3 and 1/3 in the second row and third row come from the fact that if S sells in either city B 
or C, then S is twice as likely to sell in city A the next day than in the other city. (S is never in the same 
city 2 days in a row.) 

We seek the unique fixed probability vector t of the transition matrix M. To find t, we first find any 
fixed vector u = [x,y,z] of M. Thus 


0 1 0 y+ iy =x 
[x,y,z]} 2/3 O 1/3] =[x,y,z] or x+az=y 
2/3 13 0 Vv =2Z 


We find any nonzero solution of the above system of linear equations. Set z=1. By the third 
equation y = 3, and by the first equation x = 8/3. Thus, u = [8/3,3,1]. Also, 3u = [8,9,3] is a 
fixed point of M. Multiply 3u by 1/(8 + 9 +3) = 1/20 to obtain the unique fixed probability vector 
t = [2/5, 9/20, 3/20] = [0.40, 0.45, 0.15]. Thus, in the long run, S sells 40 percent of the time in city A, 45 
percent of the time in B, and 15 percent of the time in C. 


There are 2 white marbles in box A and 3 red marbles in box B. At each step in the process, 
a marble is selected from each box and the 2 marbles are interchanged. (Thus, box A always 
has 2 marbles and box B always has 3 marbles.) The system may be described by three states, 
So, 51, S2, Which denote, respectively, the number of red marbles in box A. 

(a) Find the transition matrix P of the system. 

(b) Find the probability that there are 2 red marbles in box A after 3 steps. 

(c) Find the probability that, in the long run, there are 2 red marbles in box A. 


The three states, so, 51, 52, may be described as follows: 


So Sy S2 
Box A 2W 1W,1R 2R 
Box B 3R 1W,2R 2W,1R 


(a) There are three cases according to the state of the system. 


(1) Suppose the system is in state so. Then a white marble must be chosen from box A and a red 
marble from box B, so the system must move to state s,. Thus, the first row of P must be 
[0, 1, 0]. 
(2) Suppose the system is in state s,. There are three subcases: 
(i) The system can move to state sy if and only if a red marble is selected from 


box A and a white marble from box B. The probability that this happens is 
(1/2)(1/3) = 1/6. 


(ii) The system can move to state s, if and only if a white marble is selected from 
box A and a red marble from box B. The probability that this happens is 
(1/2)(2/3) = 1/3. 


(iii) By (i) and (ii), the system remains in state s; with probability 1 — 1/6 — 1/3 = 1/2. 
Thus, the second row of P must be [1/6, 1/2, 1/3]. 
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(3) Suppose the system is in state s;. A red marble must be drawn from box A. If a red marble 
is selected from box B, probability 1/3, then the system remains in state s,; but if a white marble 
is selected from box B, then the system moves to state s,. The system can never move from 
S7 tO So. Thus, the third row of P must be [0, 2/3, 1/3]. 

Therefore, the required transition matrix is as follows: 


So Sy SQ 
Sof 0 1 0 
P=s,|1/6 1/2 1/3 
So| O 2/3 1/3 


(b) The system begins in state so. Thus, the initial probability distribution is go = [1,0,0]. There- 
fore: 
1 = qoP = (0, 1, 0] d2 = 1 P = [1/6, 1/2, 1/3] 43 = q2P = [1/12, 23/36, 5/18] 
Accordingly, the probability that there are 2 red marbles in box A after three steps is 5/18. 


(c) We seek the unique fixed probability vector t of the transition matrix P. To find t, we first find any 
fixed vector [x,y,z] of P. Thus: 


0 1 0 iy=x 
[x,y,z]] 1/6 1/2 1/3] = [x,y,z] or xthyt+i=y 
0 2/3 1/3 ytaz =z 


We find any nonzero solution of the above system of linear equations. Set, say,x =1. By the first 
equation y = 6, and by the third equation z = 3. Thus, u = [1, 6, 3] is a fixed point of P. Multiply 
u by 1/(1 + 6 + 3) = 1/10 to obtain the unique fixed probability vector t = [0.1, 0.6, 0.3]. Thus, in the 
long run, 30 percent of the time there will be 2 red marbles in box A. 


Remark: Note that the long-run probability distribution is the same as if the 5 marbles were placed 
in a box, and 2 marbles were selected at random to put in box A. 


MISCELLANEOUS PROBLEMS 


7.21. The transition probabilities of a Markov process may be described by a diagram, called a 
transition diagram as follows. The states are points (vertices) in the diagram, and a positive 
probability p, is denoted by an arrow (edge) from state a; to the state a; labelled by p,;. Find 
the transition matrix P of each transition diagram in Fig. 7-1. 


a a 2 
a a2 
i 1 1 
2 2 
1 1 
2 1 2 
2 ! 
2 
a2 1 a3 I 
EO = 


(a) (b) 
Fig. 7-1 
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(a) The state space is S = [a,, a2, a3], and hence the transition matrix P has the following form: 


a 
aq 
P= ay 


a3 


Row i of P is obtained by finding the arrows which emanate from qa; in the diagram; the number attached 
to the arrow from a; to a; is the jth component of row i. 


matrix: 


ay 0 
P = a)| 1/2 
1/2 


a3 


(b) The state space is S = {a,, dp, a3, a4}. 
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a, a3 


4. a3 
0 1 
0 1/2 
0 1/2 


The required transition matrix is as follows: 


a a3 ay 

1/2 O 1/2 

1/2 O 1/2 
0 0 1/2 
0 1 O 


Thus, the following is the required transition 


7.22. Suppose the following is the transition matrix of a Markov process: 
a a a3 ay 
afi/2 1/2 0 O 
p-@ 172 12 0 0 
a3} 1/4 1/4 1/4 1/4 
as. 1/4 1/4 1/4 1/4 


Show that the Markov process is not regular. 


Note that once the system enters state a, or a2, then it can never move to state a3 or dy, that is, the 
system remains in the state subspace {a,, a}. Thus, every power of P will have 0 entries in the 3rd and 
4th positions of the first and second rows. 


(1,3), (1, 4), (2, 3), (2, 4) 


Supplementary Problems 


MATRIX MULTIPLICATION 


1 =2 3 
7.23. GivenA=]4 1 -1 Find uA where: (a) u = [1, —3, 2], (b) u = [3, 0, —2], 
5 2 3 


(c) w= [4, -1, -1]. 
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2 1 
1 -1 4 
7.24. Given A= b ;| and B=]6 -—3]. Find AB and BA. 
1 -=2 


; 2 2 ’ 5 : 
7.25. Given A= 3 =4 |" Find A* and A’. 


1 2 . 
7.26. Given A = | . Find (a) A’, (b) A’, (c) A”. 


0 


PROBABILITY VECTORS AND STOCHASTIC MATRICES 


7.27. Which vectors are probability vectors? 


u = [1/4, 1/2, —1/4, 1/2], v = [1/2, 0, 1/3, 1/6, 1/6], w = [1/12, 1/2, 1/6, 0, 1/4] 


7.28. Find a scalar multiple of each vector v which is a probability vector: 


(a) v = [3, 0,2, 5, 3], (b) v = [2,3, 0,4, % ll, (c) ¥= 63,0, 4). 


7.29. Which matrices are stochastic? 


0 1 0 1 0 0 1 
A = > B = > Cc = > 
Ee 1/4 Hl E ‘| las a 


REGULAR STOCHASTIC MATRICES AND FIXED PROBABILITY VECTORS 


7.30. Find the unique fixed probability vector t of each matrix: 


2/3 1/3 0.2 0.8 
2/5 3/5 |’ 0.5 0.5 ]’ 


0.7 


a) B= 0.6 


(b) B=| (c) c=| 


7.31. Find the unique fixed probability vector t of each matrix: 


0 12 12 0 1 0 
(a2) A=|13 23 0 |, (6) B=|12 0 12 
0 1 #0 1/2 1/4 1/4 


7.32. Consider the following stochastic matrix: 


0 3/4 1/4 
P=/1/2 1/2 O 
0 1 0 


(a) Show that P is regular. 

(b) Find the unique fixed probability vector t of P. 
(c) What matrix does P” approach? 

(d) What vector does [1/4, 1/4, 1/2] P” approach? 


0.3 
0.4 
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Consider the following stochastic matrix: 


0 1/72 1/72 +O 
1/2 1/44 O 1/4 
0 0 0 1 
0 12 #O 1/2 

(a) Show that P is regular. 

(b) Find the unique fixed probability vector t of P. 

(c) What matrix does P” approach? 

(d) What vector does [1/4, 0, 1/2, 1/4] P” approach? 


Consider the following general 3 X 3 stochastic matrix: 


1-a-—b a b 
P= c 1-c-—d d 
e f 1-e-f 
Show that the following vector v is a fixed point of P: 
v = [cft+ce + de, af + bf + ae, ad + bd + bc| 
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7.35. 


7.36. 


7.37. 


7.38. 


John either drives or takes the train to work. If he drives to work, then the next day he takes the train 
with probability 0.2. On the other hand, if he takes the train to work, then the next day he drives with 
probability 0.3. Find out how often, in the long run, he drives to work. 


Mary’s gambling luck follows a pattern. If she wins a game, the probability of winning the next game is 
0.6. However, if she loses a game, the probability of losing the next game is 0.7. There is an even chance 
that she wins the first game. 

(a) Find the transition matrix M of the Markov process. 

(b) Find the probability that she wins the second game. 

(c) Find the probability that she wins the third game. 


(d) Find out how often, in the long run, she wins. 
Suppose qo = [1/4, 3/4] is the initial state distribution for a Markov process with the following transition 
matrix: 
1/2 1/2 
M- 
3/4 1/4 
(a) Find q;, q2, and q3. (b) Find the vector v that g,M” approaches. (c) Find the matrix that M” 


approaches. 


Suppose qo = [1/2, 1/2, 0] is the initial state distribution for a Markov process with the following transition 
matrix: 


2 O 1/2 
M=j] 1 0 O 
1/4 1/2 1/4 


(a) Find q;, q2, and q3. (b) Find the vector v that g,M” approaches. (c) Find the matrix that M” 
approaches. 
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7.39. Each year Ann trades her car for a new car. If she has a Buick, she trades it in for a Plymouth. If she 
has a Plymouth, she trades it in for a Ford. However, if she has a Ford, she is just as likely to trade it in 
for a new Ford as to trade it in for a Buick or for a Plymouth. In 1995 she bought her first car which was 
a Ford. 


(a) Find the probability that she has bought: (i) a 1997 Buick, (ii) a 1998 Plymouth, (iii) a 1998 Ford. 
(b) Find out how often, in the long run, she will have a Ford. 


MISCELLANEOUS PROBLEMS 


7.40. Find the transition matrix corresponding to each transition diagram in Fig. 7-2. 
5 5 1 ‘e, 
ale 
ay 


—_—_——_——— 
ay a2 a 
1 
2 1 
! 2 
2 1 1 1 ! | 
4 2 2 4 
a3 
1 


1 
4 a3 2 ag 
(a) (b) 
Fig. 7-2 


7.41. Draw a transition diagram for each transition matrix: 


ay 4, a3 


ay az 
12 1p afo 1/2 1/2 
a 
Gy B=" ; (b) P=a|1/4 1/4 1/2 
a| 1/3 2/3 
a;| 0 1/2 1/2 
7.42. Consider the vector e; = [0,....,0,1,0,....,0] which has 1 in the ith position and 0’s elsewhere. Show 


that, whenever defined, e; A = r;, where r; is the ith row of A. 


Answers to Supplementary Problems 


7.23. (a) [-1, —1, 12]; (b) [-7, —10, 3]; (c) [—5, —11, 10]. 


ald 5 -1 13 
7.24. AB= >; BA=]-3 -9 9]. 
17-10 
-5 -3 -6 
fon FOES sc. (26: 48 
725: A= 2 | ASS ; 
3 7 27-1 


ee ;_[1 6 1 2n 
7.26. A? = : GAS - As 
01 01 0 1 
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7.27, 


7.28. 


7.29, 


7.30. 


7.31. 


7.32. 


7.33. 


7.35. 


7.36. 


7.37. 


7.38. 


7.39. 


7.40. 


7.41. 
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Only w. 

(a) 4[3, 0, 2, 5, 3]; (b) K[8, 2, 0, 1, 3, 4]; (C) G[6, 2, 0, 3]. 
Only B and D. 

(a) (6/11, 5/11]; (b) [5/13, 8/13]; (c) [2/3, 1/3]. 


(a) [2/9, 6/9, 1/9]; (b) [5/15, 6/15, 4/15]. 


(a) P? has only positive entries; (b) t = [4/13, 8/13, 1/13]; (c) all rows are t. 


(a) P® has only positive entries; (b) t = [2/11, 4/11, 1/11, 4/11]; (c) all rows are t. 


0.8 0.2 


Transition matrix M = 
0.3 0.7 


. John drives 60 percent of the time. 


0.6 0.4 


it be 0.7 


; (b) 45 percent; (c) 43.5 percent; (d) 3/7 ~ 42.9 percent. 


(a) q, = [11/16, 5/16], > = [37/64, 27/64], q, = [155/256, 101/256]; b [3/5, 2/5] 


© | 


(d) t. 


(d) t. 


(a) q = [3/4, 0, 1/4], q2 = [7/16, 2/16, 7/16], qs = [29/64, 14/64, 21/64]; (b) [3/6, 1/6, 2/6]; 


(c) all rows are [3/6, 1/6, 2/6]. 


(a) (i) 1/9, (ii) 7/27, (iii) 16/27; (b) 50 percent of the time. 


0 oO 1 0 
a ny 1/4 1/2 0 1/4 
(a) | O 1/2 1/2}; (b) ‘ 
2 0 O 12 
1/2 1/4 1/4 
1/2 1/2 0 O 
See Fig. 7-3. 


1 . /)\ 
a 
1 


© 


(a) 


Fig. 7-3 
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(b) 


i 
2 


3/5 2/5 
3/5 2/5 


1 
4 
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APPENDIX A 


Descriptive 
Statistics 


A.l_ INTRODUCTION 


Statistics means, on the one hand, lists of numerical data. For example, the weights of the 
students at a university, or the number of children per family in a city. Statistics as a science, on the 
other hand, is that branch of mathematics which organizes, analyzes, and interprets such raw data. 

This appendix will mainly cover topics related to the gathering and description of data, called 
descriptive statistics. It is closely related to probability theory in that the probability model that one 
develops for the events of a space usually depends on the relative frequencies of such events. The 
topics of inferential statistics, such as estimation and testing hypothesis, lie beyond the scope of this 
appendix and text. 

The numerical data x,, x2, ... we consider will either come from a random sample of a larger 
population or from the larger population itself. We distinguish these two cases using different 
notation as follows: 


n = number of items in a sample N = number of items in the population 
xX = sample mean = population mean 

s° = sample variance o = population variance 

s = sample standard deviation o = population standard deviation 


Note that Greek letters are used with the population and are called parameters, whereas Latin letters 
are used with the samples and are called statistics. First we will give formulas for the data coming 
from a sample. This will be followed by formulas for the population. 


A.2 FREQUENCY TABLES, HISTOGRAMS 


One of the first things that one usually does with a large list of numerical data is to collect them 
into groups (grouped data). A group, sometimes called a category, refers to the set of numbers all 
of which have the same value x;, or to the set (class) of numbers in a given interval where the midpoint 
x; of the interval, called the class value, serves as an approximation to the values in the interval. We 
assume there are k such groups with f; denoting the number of elements (frequency) in the group with 
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value x; or class value x; Such grouped data yields a table, called a frequency distribution, as 
follows: 


Value (or class value) | xy X2 X3 mae Xx 


Frequency | fi h fs wee i, 


Thus, the total number of data items is 
n=fithtoth=D fi 


As usual, > will denote a summation over all the values of the index, unless otherwise specified. 


Our frequency distribution table usually lists, when applicable, the ends of the class intervals, 
called class boundaries or class limits. We assume all intervals have the same length called the class 
width. Ifa data item falls on a class boundary, it is usually assigned to the higher class. 

Sometimes the table also lists the cumulative frequency function F, where F, is defined by 


F=ftht--th=>d fh 


iss 
That is, F, is the sum of the frequencies up to f,. Thus, F, =n, the number of data items. 


The number k of groups that we decide to use to collect our data should not be too small or too 
large. If it is too small, then we will lose much of the information of the given data; if it is too large, 
then we will lose the purpose of grouping the data. The rule of thumb is that & should lie between 
5and12. We illustrate the above with two examples. Note that any such frequency distribution can 
then be pictured as a histogram or frequency polygon. 


EXAMPLE A.1 Suppose an apartment house has n = 45 apartments, with the following numbers of tenants: 


2, 15 3, 5, 25. 2, 2; 1; 4, 2, °6, 2;. 4, 3, 1 
25 Ay 3, Ay 4 423. 4 Ay. 2 2. 3, Tee AL 2 
3, Ly. Dy. <2; 4, Ty 3; 2, 44.4, 2;. Sy I, 3, 4 


Observe that the only numbers which appear in the list are 1, 2, 3, 4, 5, and 6. The frequency distribution, 
including the cumulative frequency distribution, follows: 


Number of people 1 2 3 4 5 6 
Frequency 8 14 7 12 3 1 
Cumulative frequency 8 22 29 41 44 45 


The sum of the frequencies is n = 45, which is also the last entry in the cumulative frequency row. 


Figure A-1 shows the histogram corresponding to the above frequency distribution. The 
histogram is simply a bar graph where the height of the bar is the frequency of the given number in 
the list. Similarly, the cumulative frequency distribution could be presented as a histogram; the 
heights of the bars would be 8, 22, 29, ..., 45. 
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10 


Days 


i) 


1 2 3 4 5 6 65 70 75 80 85 90 95 100 105 110 115 


Temperature 


Fig. A-1 Fig. A-2 


EXAMPLE A.2 Suppose the 6:00p.m. temperatures (in degrees Fahrenheit) for a 35-day period are as 
follows: 

72.4, 78.2, 86.7, 93.4, 106.1, 107.6, 98.2, 92.0, 81.4, 77.2 

87.9, 82.4, 91.6, 95.0, 92.1, 83.9, 76.4, 78.4, 73.2, 81.4 

86.2, 92.4, 93.6, 84.8, 107.5, 99.2, 94.7, 86.1, 81.0, 77.7 

73.5, 76.0, 80.2, 88.8, 91.3 


Rather than find the frequency of each individual data item, it is more useful to collect the data in classes as 
follows (where the temperature 95.0°F is assigned to the higher class 90 to 95 rather than the lower class 85 to 90: 


Class boundaries 70-75. 75-80 =80-85 85-90 )—- 90-95. 95-100 =. 100-105 —- 105-110 
Class value 72.5 TLD 82.5 87.5 92.5 97.5 102.5 107.5 
Frequency 3 6 7 5 8 3 0 3 
Cumulative frequency 3 9 16 21 29 32 32 35 


The class width for this distribution is w = 5. The sum of the frequencies is n = 35; it is also the last entry in the 
cumulative frequency row. 


Figure A-2 shows the histogram corresponding to the above frequency distribution. It also shows 
the frequency polygon of the data, which is the line graph obtained by connecting the midpoints of the 
tops of the rectangles in the histogram. Observe that the line graph is extended to the class value 67.5 
on the left and to 112.5 on the right. In such a case, the sum of the areas of the rectangles equals the 
area bounded by the frequency polygon and the x axis. 


A.3.> MEASURES OF CENTRAL TENDENCY; MEAN AND MEDIAN 


There are various ways of giving an overview of data. One way is by graphical descriptions such 
as the frequency histogram or the frequency polygon discussed above. Another way is to use certain 
numerical descriptions of the data. Numbers, such as the mean and median, give, in some sense, the 
central or middle values of the data. The central tendency of our data is discussed in this section. 
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The next section discusses other numbers, the variance and standard deviation, which measure the 
dispersion or spread of the data about the mean, and the quartiles, which measure the dispersion or 
spread of the data about the median. 

Many formulas will be designated as (a) or (b) where (a) indicates ungrouped data and (b) 
indicates grouped data. Unless otherwise stated, we assume that our data come from a random 
sample of a (larger) population. Separate formulas are given for data which come from the total 
population itself. 


Mean (Arithmetic Mean) 


The arithmetic mean or simply mean of a sample x, x2, ..., x, of n numerical values, denoted by 
X (read: x-bar), is the sum of the values divided by the number of values. That is, 


= Ap bhXy teeta, Dey 
ee _ 
n n 
__ fix + fake Fer Fk _ Dix; 
fit fate fi Xfi 


The mean x is frequently called the average value. 


Sample mean: (A-1a) 


Sample mean: 


(A-Ib) 


EXAMPLE A.3 


(a) Consider the data in Example A.1. Using the frequency distribution, rather than adding up the 45 numbers, 
we obtain the mean as follows: 

__ 8(1) + 14(2) + 7(3) + 12(4) + 3(5) + 1(6) _ 126 

‘ 45 45 


= 2.8 


In other words, there is an average of 2.8 people living in an apartment. 


(b) Consider the data in Example A.2. Using the frequency distribution with class values, rather than the exact 
35 numbers, we obtain the mean as follows: 
—  3(72.5) + 6(77.5) + 7(82.5) + 5(87.5) + 8(92.5) + 3(97.5) + 0(102.5) + 3(107.5) 
a 
35 


3052.5 
nem Be) 
35 


That is, the average 6:00 p.m. temperature is approximately 87.2°F. 


Median 


Suppose a list x1, x2, ..., x, of m data values is sorted in increasing order. The median of the data, 
denoted by 


¥ (read: x-tilda) 


is defined to be the midvalue (if n is odd) or the average of the two middle values (if n is 
even). That is, 


Xe+1 when n is odd and n = 2k + 1 


— ma ee AD 
edian o See when n is even and n = 2k ie 


Note that £ is the average of the (/2)th and [(m/2) + 1]th terms when z is even. 
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Suppose, for example, the following two lists of sorted numbers are given: 


List A: 3, 3,5, 7,8 
List B: 1, 2,5,5, 7, 8, 8,9 


List A has 5 terms; hence the middle term is the third term. Thus, its median ¥=5. List B has 8 
terms; hence there are two middle terms, the fourth term 5 and the fifth term 7. Thus, its median 
X = 6, the average of the two middle terms. 

The cumulative frequency distribution can be used to find the median of an arbitrary set 
of data. 

One property of the median X is that there are just as many numbers less than ¥ as there are 
greater than X. 

Suppose the data are grouped. The cumulative frequency distribution can be used to find the 
class with the median. Then the class value is sometimes used as an approximation to the median or, 
for a better approximation, one can linearly interpolate in the class to find an approximation to the 
median. 


EXAMPLE A.4 


(a) Consider the data in Example A.1 which gives the number of tenants in 45 apartments. Here n = 45; hence 
the median £ is the twenty-third value. The cumulative frequency row tells us that % = 3. 


(b) Consider the data in Example A.2 which gives the 6:00 p.m. temperatures for a 35-day period. The median 
is the eighteenth value, and its exact value can be found by using the original data before they are grouped 
into classes. Using the grouped data, we can find an approximation to the median in two ways. Note, first, 
using the cumulative frequency row, that the median is the second value in the group 85-90 which has five 
values. Thus: 


(i) Simply let % = 87.5, the class value of the group. 


(ii) Linearly interpolate in the class to obtain 
& = 85 + 3(5) = 87.0 


Clearly (ii) will usually give a better approximation to the median. 


Midrange 


The midrange of a sorted sample x1, X2,...,X, 1s the average of the smallest value x, and the largest 
value x,. That is, 


XT Xp 
2 


Midrange: mid (A-3) 


For the data in Example A.1, x, = 1 and x, =6. Thus 


For the data in Example A.2, x, = 72.5 and x, = 107.5. Thus 


seat . 
seg tee 1? — 90.0 


(Again we use class values rather than the original data for our formula.) 
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Additional Measurements 


(1) Weighted Mean (Weighted Arithmetic Mean): Suppose each value x; is associated with a 
nonnegative weighting factor w; Then the weighted mean is defined as follows: 
—  LWX; Wy Xp +H WoXq br + WEE 


ight : A-4 
Weighted mean x 7 ee en (A-4) 


Here > w; is the total weight. Note that Formula (A-/b) is a special case of Formula (A-3) where 
the weight of x; is its frequency. 


(2) Grand Mean: Suppose there are k samples and the ith sample has mean <x; and n; ele- 
ments. Then the grand mean, denoted by x (read: x-double bar) is defined as follows: 


DVNj;X;, Ny Xy + Ny X. +++ + Nyx, 


Grand mean: x (A-5) 
DN; Ny +ny tess + ny 
Population Mean 
Suppose x;, X2, ..., X, are the N numerical values of some population. The formula for the 


population mean, denoted by the Greek letter w (read: mu), follows: 
ky tag te + xXy BX; 
N N 
= fix. + fox, tee + fXe = > fix; 
fit fate fe fi 


(We emphasize that N denotes the number of elements in the population whereas n denotes the 
number of elements in a sample of the population.) 


Population mean: 


Population mean: 


Remark: Observe that the formula for the population mean w is the same as the formula for the 
sample mean x. On the other hand, there are formulas for the population which are not the same as 
the corresponding formulas for the sample. For example, the formula for the (population) standard 
deviation o (Section A.4) is not the same as the formula for the sample standard deviation s. 


A.4_ MEASURES OF DISPERSION: VARIANCE AND STANDARD DEVIATION 
Consider the following two samples of 1 = 7 numerical values: 
List A: 7, 9, 9, 10, 10, 11, 14 
List B: 7,7, 8, 10, 11, 13, 14 


Observe that the median (middle value) of each list is = 10. Furthermore, the following shows that 
both lists have the same mean x = 10: 


+9+9+10+10+11+ 
pide gal toe ee ie 
7 7 
+7+8+10+114+13+ 
ey ea: ae 11+13+14 | ue = 


Although both lists have the same first and last elements, the values in list A are clustered more closely 
about the mean than the values in list B. This section will discuss important ways of measuring such 
dispersions of data. 
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Variance and Standard Deviation 


Consider a sample of values x,, x2, ...,X,, and suppose xX is the mean of asample. The difference 
x; — x is called the deviation of the data value x; from the mean x; it is positive or negative accordingly 
as x; is greater or less thanx. The sample variance, denoted by s”, is defined as the sum of the squares 
of the deviations divided byn—1. Namely, 


Qik fsa) tee eae). Se SP 


Sample variance: 57 = =A a4 (A-6a) 
P file, — xP + 6m —x)P +---+f.,.-—x)  DF(x,-— xy 

I ; os = A-6b 

Sample variance s Ath+ thr Sfol (A-6b) 


The nonnegative square root of the sample variance s*, denoted by s, is called the sample standard 
deviation. That is, 


Sample standard deviation: s=Vs? (A-7) 


If the data are organized into classes, then we use the ith class value for x; in the above Formula 
(A-6b). 

The data in most applications and examples will come from some sample; hence we may simply 
say variance and standard deviation, omitting the adjective ‘‘sample”’. 

Since each squared deviation is nonnegative, so is the variance s*. Moreover, s” is zero precisely 
when all the data values are all equal (and, therefore, are all equal to the mean x). Accordingly, if 
the data are more spread out, then the variance s* and the standard deviation s will be larger. 

One advantage of the use of the standard deviation s over the variance s* is that the standard 
deviation s will have the same units as the original data. 


EXAMPLE A.5 Consider the lists A and B above. 
(a) List A hasa mean x =10. The following are the deviations of the 7 data values: 


7—10 3, 9-10 1, 9-10 1, 10-10=0, 10-10=0, 11-10=1, 14-10=4 


The squares of the deviations are as follows: 


(-3%=9, (-1%=1, (-1%=1, @=0, @=0, LP=1, ¥=16 


Also, n—1=7-—1=6. Therefore, the sample variance s* and standard deviation s are derived as 


follows: 
se ae PSD pre ear NO 2 Ot xs 
6 6 : 
and s = V4.67 = 2.16 


(b) List B also has a mean x =10. The deviations of the data and their squares follow: 


(-3)? = 9, (-3)? = 9, (-2? = 4, 0 =0, 12 =1, 3? =9, 2 = 16 


Again,n —1=6. Accordingly, the sample variance s* and standard deviation s are derived as follows: 
> 9+9+44+0414+9+4 16 a 6 
gS eae 
6 6 


and 5 = V8 =~ 2.83 


Note that list B, which exhibits more dispersion than A, has a larger variance and standard deviation than 
list A. 
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Alternate Formulas for Sample Variance 


Alternate formulas for the sample variance, that is, which are equivalent to Formulas (A-6a) and 
(A-6b) are as follows: 


ge Dx? — (2x,)*/n 


Sample variance: | (A-8a) 
2d fix? — (2 fxyf 
Sample variance: s= fix s Le) fi (A-8b) 


Again, if the data are organized into classes, then we use the class values as approximations to the 
original values in the above Formula (A-8b). 

Although Formulas (A-8a) and (A-8b) may look more complicated than Formulas (A-6a) and 
(A-6b), they are usually more convenient to use. In particular, these formulas only use one 
subtraction in the numerator, and they can be used without first calculating the sample mean x. 


EXAMPLE A.6 Consider the following n = 9 data values: 


3, 3; 8, 9. 10, 12, 13, 15, 20 


Find: (a) mean ¥, (b) variance s* and standard deviation s. 
First construct the following table where the two numbers on the right, 95 and 1217, denote the sums > x; and 
= x?, respectively: 


(It is currently common practice and notationally convenient to write numbers and their sum horizontally rather 
than vertically.) 


(a) By Formula (A-/a), where n = 9, 


¥ = (Sx)/n = 95/9 = 10.56 
(b) Here we use Formula (A-8a) with n = 9 andn—1=8: 


> _ 1217 — (95)7/9 _ 1217 — 1002.78 
= _ 
8 8 


Then s = V26.78 = 5.17 


= 26.78 


Note that if we used Formula (A-6a), we would need to subtract x = 10.56 from each x; before squaring. 


EXAMPLE A.7 Consider the data in Example A.1 which gives the number of tenants in 45 apartments. The 
sample mean ¥ = 2.8 was obtained in Example A.3. Find the sample variance s* and the sample standard 
deviation s. 

First extend the frequency distribution table of the data as follows (where SUM refers to >f,, =f,x;, and 
> fix?): 


Number of people x; 1 2 3 4 5 6 SUM 
Frequency f; 8 14 7 12. 3 1 45 
fix; 8 28 21 48 15 6 126 
x? 1 4 9 16 25 36 
fix? 8 56 63 192 75 36 430 
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_ 430 — (126)7/45 _ 
44 


Then s? 1.75 and s~ 1.32 


Note that n = 45 andn — 1 = 44. 


Measures of Position: Quartiles and Five-Number Summary 


Consider a set of m data values x,, x2, ..., x, which are arranged in increasing order. Recall that 
the median M = X of the data values has been defined as a number for which, at most, half of the values 
are less than M and, at most, half of the numbers are greater than M. Here “half” means n/2 when 
n is even and (n — 1)/2 when n is odd. Specifically 


+ 
Median M = —— when n is even and n = 2k 
Xevt when n is odd and n = 2k +1 


The first, second, and third quartiles, Q,, Q2, Q3, are defined as follows: 


Q, = median of the first half of the values 
Q, = M = median of all the values 


Q; 


The 5-number summary of the data is the following quintuple: 


[L, Ou, M, Q3, Hy 


where L = x, is the lowest value, Q,, M = Q>, Q3, are the quartiles, and H = x, is the highest value. 
The range of the above data is the distance between the lowest and highest value, and the 
interquartile range (IQR) is the distance between the first and third quartiles; namely, 


range = H-L and IQR = Q;- OQ; 


median of the second half of the values 


Observe that 
Range Interval: [L, H] contains 100 percent of the data values. 
IOR Interval: [Q,, Q3] contains about 50 percent of the data values. 
Also, observe that the 5-number summary [L, Q,, M, Q3, H] or, equivalently, the 4 intervals, 
[L, Qi], [Q:, M], [M, Qs], [Q;, H] 


divide the data into 4 sets where each set contains about 25 percent of the data values. 


EXAMPLE A.8 Consider the following two lists of n = 7 numerical values: 
List A: 7, 9, 9, 10, 10, 11, 14 
List B: 7,7, 8, 10, 11, 13, 14 


The median of both lists is the fourth value M=10. Find the quartiles Q,; and Q3, the 5-number summary 
[L, Q1, M, Q3, H], and the range and interquartile range (IQR) of each list. Compare the range and IOR of 
both lists. 


(a) The median M = 10 of list A divides the set into the first half {7, 9, 9} and the second half {10, 11, 14}. Hence 
Q,=9and Q,;=11. Also, L = 7 is the lowest value and H = 14 is the highest value. Thus, the 5-number 
summary of list A follows: 


[L, O:, M, Q3, H [7, 9. 10, 11; 14] 


Furthermore 


range = H-L=14-7=7 and IOR = Q3;- Q,=11-9=2 
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(b) The median M = 10 of list B divides the set into the first half {7, 7, 8} and the second half {11, 13,14}. Hence 
Q, =7and Q;= 13. Also, L = 7 is the lowest value and H = 14 is the highest value. Thus, the 5-number 
summary of list B is as follows: 

[L, O01, M, Q3, H = [7, 15 10, 13, 14] 
Furthermore range = 14-—7=7 and IOR = 13-7=6 

Although list B exhibits more dispersion than list A, the ranges of both lists are the same. However, the IQR = 6 

of list B is much larger than the IQR = 2 of list A. Generally speaking, the IQR usually gives a more accurate 


description of the dispersion of a list than the range since the range may be strongly influenced by a single small 
or large value. 


EXAMPLE A.9 Consider the following list of n = 30 numerical values: 


4 5 5 7 8 8 9 10 10 11 11 11 12 12 12 
13 13 14 14 14 15 16 16 18 18 19 19 20 22 25 
Find the median M, the quartiles Q, and Q3, the 5-number summary [L, Q,, M, Q3, H], and the range and 


interquartile range (IQR) of the data. 
Here n = 30 is even, so the median M is the average of the fifteenth and sixteenth values. Thus 


+ 
M= ae 12.5 


The first quartile Q, is the mean of the first half (first 15) numbers, so Q, = 10, the eighth number of the first half 
sublist. The third quartile Q; is the mean of the second half (second 15) numbers, so Q; = 16, the eighth number 
of the second half sublist. Here, L = 4 and H = 25, so the 5-number summary follows: 


[L, Oy, M, 03, H] = [4, 10, 12.5, 16, 25] 
Furthermore: range = H-L=25—-4=21 and IOR = Q;—- QO, = 16-10 =6 


A.5 BIVARIATE DATA, SCATTERPLOTS, CORRELATION COEFFICIENTS 


Quite often in statistics it is desired to determine the relationship, if any, between two variables, 
such as between age and weight, weight and height, years of education and salary, amount of daily 
exercise and cholesterol level, and so on. Letting x and y denote the two variables, the data will 
consist of a list of pairs of numerical values 


(41,91), (Xo, Yo); (x3, ys), sey (Xn; Yn) 


where the first values correspond to the variable x and the second values correspond to y. 

As with a single variable, we can describe such bivariate data both graphically and numeri- 
cally. Our primary concern is to determine whether there is a mathematical relationship, such as a 
linear relationship, between the data. 

It should be kept in mind that a statistical relationship between two variables does not necessarily 
imply there is a causal relationship between them. For example, a strong relationship between weight 
and height does not imply that one variable causes the other. On the other hand, eating more does 
usually increase the weight of a person but it does not usually mean there will be an increase in the 
height of the person. 


Scatterplots 


Consider a list of pairs of numerical values representing variables x and y._ The scatterplot of the 
data is simply a picture of the pairs of values as points in a coordinate plane R*. The picture 
sometimes indicates a relationship between the points as illustrated in the following examples. 
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EXAMPLE A.10 


(a) Consider the following data where x denotes the ages of 6 children and y denotes the corresponding number 
of correct answers in a 10-question test: 


The scatterplot of the data appears in Fig. A-3(a). The picture of the points indicates, roughly speaking, that 
the number of correct answers increases as the age increases. We then say that x and y have a positive 


correlation. 
14 104 a 
e 
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e 
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Fig. A-3 


(b) Consider the following data where x denotes the average daily temperature, in degrees Fahrenheit, and y 
denotes the corresponding daily natural gas consumption, in cubic feet: 


x | 50 45 40 38 32 40 55 


y | 2.5 5.0 6.2 74 8.3 47 1.8 
The scatterplot of the data appears in Fig. A-3(b). The picture of the points indicates, roughly speaking, 


that the gas consumption decreases as the temperature increases. We then say that x and y have a negative 
correlation. 


(c) Consider the following data where x denotes the average daily temperature, in degrees Fahrenheit, over a 
6-day period and y denotes the corresponding number of defective traffic lights: 


x | 72 78 75 74 78 76 


The scatterplot of the data appears in Fig. A-3(c). The picture of the points indicates that there is no 
apparent relationship between x and y. 
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Correlation Coefficient 


Scatterplots indicate graphically whether there is a linear relationship between two variables x and 
y. A numeric indicator of such a linear relationship is the sample correlation coefficient r of x and y, 
which is defined as follows: 
2-0 ¥) (A-9) 
V> (x; = xy (yi ~~ yy 
We assume the denominator in Formula (A-9) is not zero. It can be shown that the correlation 
coefficient r has the following properties: 


(1) -1srsil. 


(2) r>Oif y tends to increase as x increases and r <0 if y tends to decrease as x increases. 


Sample correlation coefficient: = r= 


(3) The stronger the linear relationship between x and y, the closer r is to —1 or 1; the weaker the 
linear relationship between x and y, the closer r is to 0. 


An alternate formula for computing r is given below; we then illustrate the above properties of r with 
examples. 


Another numerical measurement of bivariate data with variables x and y is the sample covariance 
which is denoted and defined as follows: 


_ 2 — x) - ¥) 
n-1 


Sample covariance of x and y: Szy (A-10) 


Formula (A-9) can now be written in the more compact form as 


. . s 
Sample correlation coefficient: r= 


xy 


SySy 


where s, and s, are the sample standard deviations of x and y, respectively, and s,, is the sample 
covariance of x and y defined above. 
An alternate formula for computing the correlation coefficient r follows: 
> iyi > i > i / 
“iyi ~ 2x) yin (A-ID) 
VE x? — (2 x)?/n VE y7 — (Syn 

This formula is very convenient to use after forming a table with the values of x;, y;, x7, y7, x;y;, and 
their sums, as illustrated below. 


EXAMPLE A.11_ Find the correlation coefficient r for each data set in Example A.10. 


(a) Construct the following table which gives the x, y, x’, y’, and xy values, and the last column gives the 
corresponding sums: 


Now use Formula (A-//7) and the number of points is n = 6 to obtain 
299 — (39)(45)/6 6.50 
ie = = 
V259 — (39)7/6 V347 — (4577/6 =9-V5.50V9.50 


0.899 
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Here r is close to 1, which is expected since the scatterplot in Fig. A-3(a) indicates a strong positive linear 
relationship between x and y. 


(b) Construct the following table which gives the x, y, x’, y? and xy values, and the last column gives the 
corresponding sums: 


Formula (A-//), with n = 7, yields 
1431.8 — (300)(35.9)/7 —106.77 
im _ 
13,218 — (300)7/7 V218.67 — (35.9)7/7 360.86 34.554 


Here r is close to —1, and the scatterplot in Fig. A-3(b) indicates a strong negative linear relationship 
between x and y. 


= —0.956 2 


(c) Construct the following table which gives the x, y, x’, y’, and xy values, and the last column gives the 
corresponding sums: 


Formula (A-//), with n = 6, yields 
1585 — (453)(21)/6 —0.500 
Sie 2 _ 
34,229 — (453)7/6 V83 — (21)7/6 =-V27.50V9.5 


Here r is close to 0, which is expected since the scatterplot in Fig. A-3(c) indicates no linear relationship 
between x and y. 


0.031 


A.6 METHODS OF LEAST SQUARES, REGRESSION LINE, CURVE FITTING 


Suppose a scatterplot of the data points (x;, y;) indicates a linear relationship between variables x 
and y or, alternately, suppose the correlation coefficient r of x and y is close to 1 or —1. Then the next 
step is to find a line L that, in some sense, fits the data. The line L we choose is called the least-squares 
line. This section discusses this line, and then we discuss more general types of curve fitting. 


Least-Squares Line 


Consider a given set of data points P,(x;,y;) and any (nonvertical) linear equation L. Let y; 
denote the y value of the point on L corresponding to x; Furthermore, let d; = y; — y;, the difference 
between the actual value of y and the value of y on the curve or, in other words, the vertical (directed) 
distance between the point P; and the line L, as shown in Fig. A-4. The sum 


Sd? =di+dj+---+d 


is called the squares error between the line L and the data points. 
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y 


(+59 + rSy) 


Fig. A-4 Fig. A-5 


The least-squares line or the line of best fit or the regression line of y on x is, by definition, the line 
L whose squares error is as small as possible. It can be shown that such a line L exists and is 
unique. Let a denote the y intercept of the line L and let b denote its slope, that is, suppose the 
following is the equation of the line L: 


y=atbx 


Then a and b can be obtained from the following two equations, called the normal equations, in the 
two unknowns a and b where n is the number of points: 


na + (2x,;)b = Dy; 


(2xj)a t+ (2x7)b = Ux; (A-12) 


Normal equations: | 
In particular, the slope b and y intercept a can also be obtained from the following formula (where r 
is the correlation coefficient): 
TS, 


b= and a=y-bx (A-13) 


Sx 
Formula (A-/3) is usually used instead of Formula (A-/2) when one needs, or has already found, the 
means x and y, the standard deviations s, and s,, and the correlation r of the given data points. 
Graphing the line L of best fit requires at least two points on L. The second equation in Formula 
(A-13) tells us that (x, y) lies on the regression line L since 


y=(y— bx) + bx =a+ bx 


also, the first equation in Formula (A-/3) then tells us that the point (x + s,,y + rs,) is also on L. 
These points are also pictured in Fig. A-S. 


Remark: Recall that the above line L which minimizes the squares of the vertical distances 
from the given points P; to L is called the regression line of y on x; it is usually used when one 
views y as a function of x. A line L’ also exists which minimizes the squares of the horizontal 
distances of the points P; from L’; it is called the regression line of x on y._ Given any two variables, 
the data usually indicate that one of them depends upon the other; we then let x denote the 
independent variable and let y denote the dependent variable. For example, suppose the variables 
are age and height. We normally assume height is a function of age, so we would let x denote age 
and y denote height. Accordingly, unless otherwise stated, our least-squares lines will be regression 
lines of y on x. 
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EXAMPLE A.12 Find the line L of best fit for the first two scatterplots in Fig. A-3. 
(a) By the table in Example A.11(a), 
Xx; = 39 Dy, = 45 Dx? = 259 > x;y; = 299 


Also, there are n = 6 points. Substitution in the normal equations in Formula (A-/2) yields the following 
system: 


6a+ 39b= 45 
39a + 259b = 299 


The solution of the system follows: 


Thus, the following is the line L of best fit. 
y = —0.18 + 1.18x 


To graph L, we need only plot two points on L and then draw the line through these points. Setting x = 5 
and x = 8, we obtain the two points: 


A(5,5.7) and —_B(8, 9.3) 


and then we draw L, as shown in Fig. A-6(a). 


10 y = —0.18 + 1.18% y = 17.810, —0.2959x 


- (30, 8.933) 


(42.8571, 5.1286) 


T T 
5 6 7 8 30 40 50 60 x 


(a) (b) 


Fig. A-6 


(b) Here we use Formula (A-/3) rather than Formula (A-/2). By Example A.11(b), with n = 7, we obtain 
r= —0.956 2 xX = 300/7 = 42.86 y = 35.9/7 = 5.129 
Using Formulas (A-8) and (A-9), we obtain 


13,218 — (300)?/7 218.67 — (35.9)7/7 
5s, = [— a = 7.7552 and =| ! i = 2.3998 
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Substituting these values in Formula (A-/2), we get 


5p — 60.956 2)(2.399 8) _ 
7.7552 


Thus, the line L of best fit follows: 


0.2959 and a=5.1286 —(—0.295 9)(42.857) = 17.810 


y = 17.810 — 0.295 9x 


The graph of L, obtained by plotting (30, 8.933) and (42.857 1, 5.128 6) (approximately) and drawing the line 
through these points, is shown in Fig. A-6(b). 


Curve Fitting 


Sometimes the scatterplot does not indicate a linear relationship between the variables x and y, 
but one may visualize some other standard (well-known) curve, y = f(x), which may approximate the 
data, called an approximate curve. Several such standard curves, where letters other than x and y 
denote constants, follow: 


(1) Parabolic curve: y = ay + a,x + a,x? 


(2) Cubic curve: y = ay + a,x + a)x? + a3x? 


1 1 
or —=a+bx 
a+bx y 


(3) Hyperbolic curve: y = 


(4) Exponential curve: y = ab* or log y = ay + a,x 

(5) Geometric curve: y = ax? or logy = loga + b logx 
(6) Modified exponential curve: y = ab* + c 

(7) Modified geometrical curve: y = ax’ +c 


Pictures of some of these standard curves appear in Fig. A-7. 


(a) Parabolic (b) Exponential (c) Hyperbolic 


Fig. A-7 


Generally speaking, it is not easy to decide which curve to use for a given set of data points. On 
the other hand, it is usually easier to determine a linear relationship by looking at the scatterplot or 
by using the correlation coefficient. Thus, it is standard procedure to find the scatterplot of 
transformed data. Specifically: 

(a) If logy versus x indicates a linear relationship, use the exponential curve (type 4). 
(b) If 1/y versus x indicates a linear relationship, use the hyperbolic curve (type 3). 
(c) If logy versus logx indicates a linear relationship, use the geometric curve (type 5). 


Once one decides upon the type of curve to be used, then that particular curve is the one that 
minimizes the squares error. We state this formally: 
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Definition: Consider a collection of curves and a given set of data points. The best-fitting or 
least-squares curve C in the collection is the curve which minimizes the sum 


Tead+dé4+---+k@ 


(where d; denotes the vertical distance from a data point P,(x;, y;) to the curve C). 


Just as there are formulas to compute the constants a and b in the regression line L for a set of 
data points, so there are formulas to compute the constants in the best-fitting curve C in any of the 
above types (collections) of curves. The derivation of such formulas usually involves calculus. 


EXAMPLE A.13 Consider the following data which indicates exponential growth: 


es 2 3 4 5 6 


y | 6 18 55 160 485 1460 


Find the least-squares exponential curve C for the data, and plot the data points and C on the plane R’. 


The curve C has the form y = ab* where a and b are unknowns. The logarithm (to base 10) of y = ab* 
yields 


logy = loga+ xlogb =a' + b'x 


where a’ = loga and b' = logb. Thus, we seek the least-squares line L for the following data: 


x | 1 2 3 4 5 6 


log y | 0.778 2 1.255 3 1.740 4 2.204 1 2.685 7 3.164 4 


Using the normal equations in Formula (A-/2) for L, we get 
a’ = 0.3028 b' = 0.4767 
The antiderivatives of a’ and b’ yield, approximately, 
a= 2.0 b = 3.0 
Thus, y = 2(3*) is the required exponential curve C. The data points and C are plotted in Fig. A-8. 


1600 5 
1400 4 ® 
1200 4 
1000 4 
8004 
600 
400 4 


200 4 
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Solved Problems 


FREQUENCY DISTRIBUTION, MEAN AND MEDIAN 


A.1. Consider the following frequency distribution which gives the number f of students who got x 
correct answers on a 20-question exam: 


x (correct answers) | 9 10 12 13 14 15 16 17 18 19 20 


f (number of students) | 1 2 1 2 7 2 1 7 2 6 4 


(a) Display the data in a histogram and a frequency polygon. 


(b) Find the mean x, median M, and midrange of the data. 


(a) The histogram appears in Fig. A-9. The frequency polygon also appears in Fig. A-9; it is obtained 
from the histogram by connecting the midpoints of the tops of the rectangle in the histogram. 


x (correct answers) 


Fig. A-9 


(b) First we extend our frequency table to include the cumulative distribution function cf, the products 
f;x;, and the sums > f; and > x;f; as follows: 


fx 9 20 12 26 98 30 16 119 36 154 80 | 560 
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There are n = 35 scores, so the mean M is the eighteenth score. The row cf in the table tells us that 
16 is the sixteenth score, and 17 is the seventeenth to twenty-third scores. Hence the mean 


M = 17, the eighteenth score 
The midrange is the average of the first score 9 and the last score 20; hence: 


+ 20 
mid = z = 145 


A.2. Consider the following n = 20 data items: 


35344765 2 4 
255643 54 5 55 
(a) Construct the frequency distribution f and cumulative distribution cf of the data, and 
display the data in a histogram. 
(b) Find the mean x, median M, and midrange of the data. 


(a) Construct the following frequency distribution table which also includes the products f;x; and the 
sums >f, and > x; f;: 


f 2 3 ) 7 2 1 | 20 
cf 2 5 10 17 19 20 
fx 4 9 20 35 12 7 | 87 


Note that the first line of the table consists of the range of numbers, from 2 to 7. The second line 
(frequency) can be obtained by either counting the number of times each number occurs or by going 
through the list one number after another and keeping a tally count, a running account as each 
number occurs. The histogram is shown in Fig. A-10(a). 


(b) Here we use Formula (A-/b) which gives the mean x for grouped data: 


DVfix; 87 
= =47 
=f, 20 


x= 


105 


64 


Frequency 
ae 
L 


60 70 8O 90 100 
x Scores 


(a) (b) 
Fig. A-10 
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There are n = 20 numbers, so the mean M is the average of the tenth and eleventh numbers. The 
row cf in the table tells us that 4 is the tenth number and 5 is the eleventh number. Hence 


445 
M=——=45 
2 


The midrange is the average of the first number 2 and the last number 7; hence 
mid = enh o 4.5 
2 
A.3. Consider the following n = 20 scores on a statistic exam: 
74 80 65 85 95 72 76 72 93 84 
75 75 60 74 75 63 78 87 90 70 


(a) Construct the frequency distribution f table where the data are grouped into four 
classes: 


60-70, 70-80, 80-90, 90-100 


The table should include the class values x; and the cumulative distribution cf of the 
data. (Recall that if a number falls on a class boundary, it is assigned to the higher class.) 
Also, display the data in a histogram. 


(b) Find the mean x, median M, and midrange of the data. 


(a) Construct the following frequency distribution table which also includes the products f,x; and the 
sums > f, and > x; f;: 


Class 60-70 70-80 80-90 90-100 


Xx; 65 75 85 95 
f 3 10 4 3 20 
cf 3 13 17 20 
fi 195 750 340 285 1570 


The histogram is shown in Fig. A-10(b). 
(b) Using the class values x;, Formula (A-/b) yields 


> fx. 
fix; 1570 _ a. 
Sf o20 


x= 


There are n = 20 numbers, so the mean M is the average of the tenth and eleventh class scores which 
we approximate using their class values. The row cf in the table tells us that 75 is the approximation 
of the tenth and eleventh scores. Thus 


M=75 
The midrange is the average of the first class value 65 and the last class value 95; hence 


65 +95 | 
2 


mid 80 


A.4. The yearly rainfall, measured to the nearest tenth of a centimeter, for a 30-year period 
follows: 


42.3 35.7 47.5 31.2 283 37.0 41.3 32.4 41.3 29.3 
34.3 35.2 43.0 363 35.7 41.5 43.2 30.7 384 46.5 
43.2 31.7 36.8 43.6 45.2 32.8 30.7 36.2 34.7 35.3 
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(a) Construct the frequency distribution f table where the data are grouped into 10 classes: 
28-30, 30-32, 32-34, ..., 46-48 
The table should include the class values (cv) x; and the cumulative distribution (cf) of the 
data. 
(b) Find the mean x, median M, and midrange of the data. 
(a) Construct the following frequency distribution table which also includes the products f;x; and the 
sums >f, and > x; f;: 
Class 28-30 30-32 32-34 3436 36-38 38-40 40-42 42-44 4446 46-48 
CV X; 29 31 33 35 37 39 41 43 45 47 
f 2 4 2 6 4 1 3 5 1 2 30 
cf 2 6 8 14 18 19 22 27 28 30 
fi 58 124 66 210 148 39 123 215 45 94 1122 
(b) Using the class values x;, Formula (A-/b) yields 
_ Sfx, 1122 
¥= =—*=374 
Sf 30 


There are nm = 30 numbers, so the mean M is the average of the fifteenth and sixteenth class 
values. The row cf in the table tells us that 37 is the fifteenth and sixteenth class value. Thus 


M =37 
The midrange is the average of the first class value 29 and the last class value 47; hence 


29 + 47 
d= 5 = 39 


MEASURES OF DISPERSION: VARIANCE, STANDARD DEVIATION, IQR 


A.5. 


(a) 
(>) 
(c) 


(a) 


(b) 


s 


Consider the following n = 10 data values: 


1, 2, 2, 3, 4, 5, 7, 8, 9, 9 
Find the sample mean x. 
Find the variance s? and standard deviation s. 


Find the median M, 5-number summary [L, Q,, M, Q;, H], range, and interquartile range 
(IOR) of the data. 


The mean x is the “average” of the numbers, the sum of the values divided by the number n = 10 
of values: 


— 142424+34+44+54+74+84+94+9 30 _s 
ss 10 10 
Method 1: Here we use Formula (A-6a). We have 
Y(xy,- xX) 164+9494+4414+04+44+94+16+16 84 
22 # Bee ; z =5=933 and = V933 = 3.05 
— 
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Method 2: Here we use Formula (A-8a). First construct the following table where the two 
numbers on the right, 50 and 334, denote the sums > x; and > x?, respectively: 


We have 


Sx? —(Sx,)'in _ 334 — (50)°/10 _ 84 
eal Esy = = 933 and s = V9.33 = 3.05 
= 


(c) Here n = 10 is even; hence the median M is the average of the fifth and sixth values. Thus 


4+5 
M=—— =45 
2 

The mean M = 4.5 divides the 10 items into two halves, A = {1, 2, 2,3, 4} and B = {5, 7, 8, 9, 9}, each 
with 5 numbers. The first quartile Q, is the median (middle element) of the first half A, so Q; = 2; 
the third quartile Q, is the median (middle number) of the second half B, so Q;= 8. Here L = 1 
is the lowest number and H = 9 is the highest number. Thus, the 5-number summary of the data 
follows: 


[L, Q1, M, Qo, H] = [1,2, 4, 5, 8, 9] 
Furthermore: range = H-L=9-1=8 and IOR = Q;—- Q, =8-2=6 


The ages of n = 30 children living in an apartment complex are as follows: 


2331223 44232262 4 
1264223 71 23 2 4 2 6 


(a) Find the frequency distribution of the data. 
(b) Find the sample mean Xx, variance s*, and standard deviation s for the data. 
(c) Find the median M, the 5-number summary [L, Q,, M, Q>, H], the range, and the IOR 


(interquartile range) of the data. 


(a) Construct the following frequency table which also includes the cumulative distribution cf function; 


products f,x;, x7, f.x7; and the sums Df, Df;x;, and > f,x?: 


x:|% 2 & A S & 8 
f |3 2 6 5 O 3 1/4 30 
off | 3 15 21 26 26 29 30 
fe | 3 24 18 2 #40 18 +74 90 
efl1 4 9 16 2 36 49 
fe | 3 48 54 80 O 108 49 | 342 
(b) We have 
4° Dix: 7 90 7 
Se Bo 
Sfx2— (Sfx) 342 —(90)7/30 72 
Also oot Sei OO Poo aid oa Aone iss 


n-1 29 29 
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A.7. 


A.8. 


(c) Heren = 30 is even; hence the median M is the average of the fifteenth and sixteenth ages. The row 
cf in the table tells us that 2 is the fifteenth age and 3 is the sixteenth age. Thus 


pias ee 
M =—— =25 
2 


The mean M = 2.5 divides the 30 items into two halves, each with 15 ages. The first quartile Q, is 
the median of the first 15 ages, so Q, is the eighth age; the third quartile Q; is the median of the last 
15 ages, so Q; is the twenty-third age. Using the cf row in the table, we obtain 


Q,=2 and Q3;=4 


Furthermore, L = 1 is the lowest number and H = 7 is the highest number. Thus, the 5-number 
summary of the data follows: 


[L, Qi, M, Q», H] = [1, 2, 2.5, 4, 7] 


Furthermore: range = H-L=7—-1=6 and IOR = Q3;-Q,=4-2=2 


Consider the following list of n = 18 data values: 
2, 7, 4, 1, 6 4, 8 15, 12, 7, 3, 16, 1, 2, 11, 5, 15, 4 
(a) Find the median M. 


(b) Find the quartiles Q; and Q3;, the 5-number summary [L, Q,, M, Qo, H], the range, and the 
IOR (interquartile range) of the data. 


(a) First arrange the data in numerical order: 
1, 1, 2, 2, 3, 4. 4 4 5, 6, 7, 7, 8 11, 12, 15, 15, 16 
There are n = 18 values, so the median M is the average of the ninth and tenth values. Thus 


Wa 38 
2 
(b) Q, is the median of the nine values, from 1 to 5, less than M. Thus, Q, = 3, the fifth value. Q; is 
the median of the nine values, from 6 to 16, greater than M. Thus, Q; = 11, the fifth value. Also, 
L = 1 is the lowest number and H = 16 is the highest number. Thus, the 5-number summary of the 
data follows: 


[L, QO;, M, Q>, H| = [1, 3, 5155, 11, 16] 
Furthermore: range = H-L=16—-1=15 and IOR = 0,;-Q0,=11-3=8 


Consider the following frequency distribution: 


(a) Find the sample mean X, variance s’, and standard deviation s for the data. 


(b) Find the median M, the quartiles Q; and Q3, the 5-number summary [L, Q:, M, Q2, H], the 
range, and the IOR (interquartile range) of the data. 
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(a) Extend the frequency table to include the cumulative distribution cf function; products f,x;, x7, fix73 
and the sums >f;, > fx; and > f,x? as follows: 


x 1 2 3 4 5 6 
f 2 4 6 8 3 2 25 
cf 2 6 12 20 23 25 
fx 2 8 18 32 15 12 87 
x? 1 4 9 16 925 36 
fx 2 16 54 128 $75 £72 | 347 
_ dfx, 87 
Therefore x= = — = 3.48 
Df 25 


Dfix?—(Vfixin 347 — (87/25 44.24 _ 


1.84 d peso 156 
net 24 2A i 


Also s? = 


(b) Here n = 25 is odd; hence the median M is the thirteenth number. The row cf in the table tells us 
that M=4. The mean M = 4 divides the 25 numbers into two halves, each with 12 numbers. The 
first quartile Q, is the median of the first 12 number, so Q, is the average of the sixth number 2 and 
the seventh number 3. Thus, Q; = 2.5. The third quartile Q; is the median of the last 12 numbers, 
the fourteenth to twenty-fifth numbers, so Q; is the average of the nineteenth number 4 and 
twentieth number 4. Thus, Q;=4. Furthermore, L = 1 is the lowest number and H = 6 is the 
highest number. Thus, the 5-number summary of the data is as follows: 


[L, Q,, M, Q>, | = [1, 2.5, 4, 4, 6] 
Furthermore: range = H-L=7-1=6 and IOR = Q;-Q,=4-2=2 


MISCELLANEOUS PROBLEMS INVOLVING ONE VARIABLE 


A.9. 


A.10. 


An English class for foreign students consists of 20 French students, 25 Italian students, and 15 
Spanish students. On an exam, the French students average 78, the Italian students 75, and the 
Spanish students 76. Find the mean grade for the class. 


Here we use Formula (A-5) for the grand mean (the weighted mean of the means) with 


n, = 20 Ny = 25 nz = 15 x, = 78 X_ = 75 x3 = 76 
—— = 2nj,x; _ 20(78) + 25(75) + 15(76) — 4575 _ 
This yields x = 7025 415 60 76.25 


That is, 76.25 is the mean grade for the class. 


A history class contains 10 freshmen, 15 sophomores, 10 juniors, and 5 seniors. On an exam, 
the freshmen average 72, the sophomores 76, the juniors 78, and the seniors 80. Find the mean 
grade for the class. 


Here we use Formula (A-5) for the grand mean with 
nm, =10, m=15, n3=10, nyg=5, x, =72, x = 76, x3= 78, x4 = 80 

Therefore: 
_ =nj;x; _ 10(72) + 15(76) + 10(78) + 5(80) — 3040 _ 


76 
=n; 10+15+10+5 40 


ball 


That is, 76 is the mean grade for the class. 
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BIVARIATE DATA 


A.11. Consider data sets whose scatterplots appear in Fig. A-11. Estimate the correlation coefficient 
r for each data set if the choice is one of —1.5, —0.9, 0.0, 0.9, 1.5. 


The correlation coefficient r must lie in the interval [—1,1]. Moreover, r is close to 1 if the data are 
approximately linear with positive slope, r is close to —1 if the data are approximately linear with negative 
slope, and r is close to 0 if there is no relationship between the points. Accordingly: 


(a) ris close to 1 since there appears to be a strong linear relationship between the points with positive 
slope; hence r ~ 0.9. 
(b) r=0.0 since there appears to be no relationship between the points. 


(c) ris close to —1 since there appears to be a strong linear relationship between the points but with 


negative slope; hence r ~ —0.9. 
A ‘ a A 
e e. e 
e o ® e 
e e e 
*e aa e — eee 
: oe fe e 
> _ > 
0 0 0 
(a) (b) (c) 


Fig. A-11 


A.12. Consider the following list of data values: 


(a) Plot the data in a scatterplot. 

(b) Compute the correlation coefficient r. 

(c) For the x and y values, find the means ¥ and y, and standard deviations s, and s,. 
(d) Find L, the least-squares line y = a + bx. 

(e) Graph L on the scatterplot in part (a). 


(a) The scatterplot (with L) is shown in Fig. A-12(a). 


(b) Construct the following table which contains the x, y, x°, y°, and xy values and where the last column 
gives the corresponding sums: 


Now use Formula (A-//) and the number of points n = 5 to obtain 


= 162 — [(29)(36)]/5 _ 46.8 
V209 — (29)7/5 V328 — (36)7/5 40.8 V68.8 


= —0.883 3 


270 


() 


(d) 


(e) 
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(The fact that r is close to —1 is expected since the scatterplot indicates a strong linear relationship 
with negative slope.) 


Use the above table and Formula (A-/a) to obtain 


€=DxJn=29/5=58 and y= Sy/n=36/5=7.2 
Also, by Formulas (A-8a) and (A-7), 


[209 — (29)°/5 [328 — (36)°/5 
Sy ‘ i 3.194 and Sy ; ui 4.147 


Substitute 7, s,, 5, into Formula (A-/3) to obtain the slope b of the least-squares line L: 


rs, _ (0.883 3)(4.147) _ 
Sy 3.194 


b= 


1.147 


Now substitute x, y, and b into Formula (A-/3) to determine the y intercept a of L: 


a = 9 — b¥ = 7.2 — (-1.147)(5.8) = 13.85 


Hence L is 
y = 13.85 — 1.147x 
Alternately, we can find a and b using the normal equations in Formula (A-/2) with n = 5: 


nat+ Xxb=Xy 5a + 29b = 36 
or 
Yxa+Dx?b = Zxy 29a + 209b = 162 


(These equations would be used if we did not also want r, x and y, and s, and s,.) 


To graph L, we find two points on L and draw the line through them. One of the two points is 
(#9) = (6.8, 7.2) 


(which always lies on any least-squares line). Another point is (10,2.4), which is obtained by 
substituting x = 10 in the regression equation L and solving for y. The line L appears in the 
scatterplot in Fig. A-12(a). 


(5.8, 7.2) 


(10, 2.4) 


Fig. A-12 
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A.13. Repeat Problem A.12 for the following data: 


(a) The scatterplot (with L) is shown in Fig. A-12(d). 


(b) Construct the following table which contains the x, y, x’, y°, and xy values and where the last column 
give the corresponding sums: 


Now use Formula (A-//) and the number of points n = 4 to obtain 


117 — [(15)(25)/4 _ 23.25 


r= = 0.938 2 
V75 — (15)7/4-V189 — (25)7/4 -V/18.75 V32.75 


(The fact that r is close to +1 is expected since the scatterplot indicates a strong linear relationship 
with positive slope.) 


(c) Use the above table and Formula (A-/a) to obtain 


ta] 


=Sx/n=15/4=3.75 and y=y,ln = 25/4 = 6.25 


Also, by Formulas (A-8a) and (A-7): 


175 — (15)7/4 /189 — (25)7/4 
5, = : iE = 2.5 and sy = : yu = 3.304 


(d) Substitute r, s,, 5, into Formula (A-/3) to obtain the slope b of the least-squares line L: 


y= BS — (0.967 5)(4.03) _ 


1.24 
Sy 2.5 


Now substitute x, y, and b into Formula (A-/3) to determine the y intercept a of L: 
a=y-— bx = 6.25 — (1.24)(3.75) = 1.60 
Hence L is 
y = 1.60 + 1.24x 
Alternately, we can find a and b using the normal equations in Formula (A-/2) with n = 4: 


nat+ Xxb=Xy 4a + 15b = 25 
Yxa+Dx7b = Vxy 15a + 75b = 117 


(These equations would be used if we did not also want r, x and y, and s, and s,.) 


(e) To graph L, we find two points on L and draw the line through them. One point is 
(x, y) = (3.75, 6.25). Another point is (0,1.60), the y intercept. The line L appears on the 
scatterplot in Fig. A-12(b). 
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The definition of the sample covariance s,, of variables x and y follows: 


_ 2% — xX) — Y) 
n-1 


*y 


Find s,, for the data in: (a) Problem A.12, (b) Problem A.13. 
(a) The above formula for s,, yields 


Syy = [(4 — 5.8)(8 — 7.2) + (2 — 5.8)(12 — 7.2) + (10 — 5.8)(4 — 7.2) + (5 — 5.8)(10 — 7.2) 
+ (8 — 5.8)(2 — 7.2)] 
[—1.44 — 18.24 — 13.44 — 2.24 — 11.44]/4 = —46.8/4 = -11.7 


We note that the variances s, and s, are always nonnegative but the covariance s,, can be negative, 
which indicates that y tends to decrease as x increases. 


(b) The above formula for s,, yields 


Sxy = [1 — 3.75)(3 — 6.25) + (3 — 3.75)(4 — 6.75)(4 — 3.75)(6 — 8.25) + (7 — 3.75)(10 — 6.25)]/3 
= [8.9375 + 2.062 5 — 0.562 5 + 12.187 5]/3 = 22.625/3 = 7.542 


The covariance here is positive which indicates that y tends to increase as x increases. 


Let W denote the number of American women graduating with a doctoral degree in 
mathematics in a given year. Suppose that, for certain years, W has the following values: 


Year | 1985 1990 1995 2000 


W | 28 36 40 45 
We assume that the increase, year by year, is approximately linear and that it will increase 
linearly in the near future. Estimate W for the years 2005, 2008, and 2010. 


Our estimation will use a least-squares line L. For notational and computational convenience we let 
the year 1980 be a base for our x values. Hence we set 


x = year — 1980 and y= number W of women getting doctoral degrees 


Thus, we seek the line y = a + bx of best fit for the data where the unknowns a and b will be determined 
by the following normal equations (A-12): 


nat+(Xx)b=Zy (2x)at+ (2xX*)b = xy 


[We do not use Formula (A-/3) for a and b since we do not need the correlation coefficient r nor do we 
need the values s,, s,, x, and y.] 

The sums in the above system are obtained by constructing the following table which contains the x, 
y, x’, and xy values and where the last column gives the corresponding sums: 


Substitution in the above normal equations, with n = 4, yields 


4a + 50b = 149 4a + 50b = 149 
50a +750b = 2000 ° a+ 15b = 40 


The solution of the system is a = 23.5 and b=1.1. Thus, the following is our least-square line L: 
y = 23.5 + 1.1x (A-14) 


The (x, y) points and the line L are plotted in Fig. A-13(a). 
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Substitute 25 (2005), 28 (2008), and 30 (2010) for x in Formula (A-/4) to obtain 51, 54.3, and 56.5, 
respectively. Thus, one would expect that, approximately, W = 51, W=54, and W=57 women will 
receive doctoral degrees in the years 2005, 2008, and 2010, respectively. 


yA a | 
so 124 
105 
40 
e 
ga 
30 
e "AC 
gut 
20 4 
a 
104 2 
a T T T T T o 0 I T I I I teas 
5 10 15 220 25 30 : . e e  N 
1980 1985-1990 -1995 2000-2005 2010 
(a) (b) 
Fig. A-13 


A.16. Find the least-square parabola C for the following data: 


Plot C and the data points in the plane R’. 


The parabola C has the form y = a+ bx + cx* where the unknowns a, b, c are obtained from the 
following normal equations [which are analogous to the normal equations for the least-square line L in 
Formula (A-/2)]: 

nat (2x)b + (2x*)c = Ty 
(2x)at+ (2xX*)b + (2x*)c = Vxy 
(2xX*)at (2xX)b+ (xc = Ux’*y 


The sums in the above system are obtained by constructing the following table which contains the x, 
y, x°, x*, x*, xy, and x’y values and where the last column gives the corresponding sums: 


Substitution in the above normal equations, with n = 6, yields 


6a + 34b + 252c = 35, 34a + 252b + 2098c = 183, 252a + 2098b + 18,564c = 1225 
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The solution of the system yields 


Thus, the required parabola C follows: 
y = 3.48 + 1.70x — 0.173x? 
The given data points and C are plotted in Fig. A-13(b). 


Derive the normal equations Formula (A-/2) for the least-squares line L for n data points 
P. (Xi, yi). 
We want to minimize the following least-square error: 


D = Xd? = X[y, — (a+ bx) P = =[a + bx; — y,]’ 


where D may be viewed as a function of a and b. The minimum may be obtained by setting the partial 
derivatives D, and D,, equal to zero. The partial derivatives follow: 


D, = =2(a + bx; — y,) and Dy, = =2(a + bx; — yi)x; 
Setting D, = 0 and D, = 0, we obtain the following required equations: 


nat (2x;)b = Zy; (2x)a + (2x7)b = = xiy; 


Supplementary Problems 


FREQUENCY DISTRIBUTIONS, MEAN AND MEDIAN 


A.18. 


A.19. 


The frequency distribution of the weekly wages, in dollars, of a group of unskilled workers follows: 


Weekly wages | 140-160 160-180 180-200 200-220 220-240 240-260 260-280 


Workers | 18 24 32 20 8 6 2, 


(a) Display the data in a histogram and a frequency polygon. 


(b) Find the mean x, median M, and midrange of the data. 


The amounts of 45 personal loans from a loan company follow: 


$700 $450 $725 $1125 $675 $1650 $750 $400 $1050 
$500 $750 $850 $1250 $725 $475 $925 $1050 $925 
$850 $625 $900 $1750 $700 $825 $550 $925 $850 
$475 $750 $550 $725 $575 $575 $1450 $700 $450 
$700 $1650 $925 $500 $675 $1300 $1125 $775 $850 
(a) Group the data into classes with class width w = $200 and beginning with $400, and construct the 
frequency and cumulative frequency distribution for the grouped data. 
(b) Display the frequency distribution in a histogram. 


(c) Find the mean x, median M, and midrange of the data. 
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A.20. The daily number of station wagons rented by an automobile rental agency during a 30-day period 


A.21, 


A.22. 


follows: 


710679 4 79 9 8 55 7 8 4 
6 9 7 12 79 10475 9 89 5 7 


(a) Construct the frequency and cumulative frequency distribution for the data. 


(b) Find the mean x, median M, and midrange of the data. 


The following denotes the number of people living in each of 35 apartments: 


Be Ds A Ds cD de, 22" ee D2 De De WD DD D2 2, 2 
3333444445 55 66 7 


(a) Construct the frequency and cumulative frequency distribution for the data. 


(b) Find the mean x, median M, and midrange of the data. 


The students in a mathematics class are divided into four groups: 


(a) much greater than the median, (c) little below the median, 


(b) little above the median, (d) much below the median. 


On which group should the teacher concentrate in order to increase the median of the class? 
the class? 


MEASURES OF DISPERSION: VARIANCE, STANDARD DEVIATION, IQR 


A.23. 


A.24, 


A.25, 


A.26. 


The prices of 1 Ib of coffee in 7 stores follow: 
$5.58, $6.18, $5.84, $5.75, $5.67, $5.95, $5.62 


(a) Find the mean x, variance s’, and standard deviation s. 
(b) Find the median M, 5-number summary, and IQR of the data. 


For a given week, the following were the average daily temperatures: 
35°F, 33°F, 30°F, 36°F, 40°F, 37°F, 38°F 


(a) Find the mean x, variance s’, and standard deviation s. 
(b) Find the median M, 5-number summary, and IQR of the data. 


Mean of 


During a given month, the 10 salespeople in an automobile dealership sold the following number of 


automobiles: 
13, 17, 10, 18, 17, 9, 17, 13, 15, 14 


(a) Find the mean x, variance s’, and standard deviation s. 
(b) Find the median M, 5-number summary, and IQR of the data. 


The ages of students at a college dormitory are recorded, producing the following frequency distribution: 


Age x | 17-18) 19 20S 21 


Frequency f | 5 20 17 6 2 


(a) Find the sample mean x and standard deviation s. 
(b) Find the median M, 5-number summary, and IQR of the data. 
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A.27. The following distribution gives the number of hours of overtime during 1 month for employees of a 


company: 
Overtime, hours | 0 1 2 3 4 5 6 7 8 9 10 
Employees | 10 2 4 2 6 4 2 4 6 2 8 


(a) Find the sample mean x and standard deviation s. 
(b) Find the median M, 5-number summary, and IOR. 


A.28. The following are 40 test scores: 


52 55 58 58 60 61 64 66 66 68 72 75 75 75 76 76 77 77 78 78 
80 80 81 82 82 84 85 85 85 86 88 90 92 95 95 95 100 100 100 100 
(a) Group the data into 5 classes with class width w = 10 beginning with 50 and construct the frequency 
and cumulative frequency distribution for the grouped data. 
(b) Find the sample mean x and standard deviation s of the grouped data. 
(c) Find the median M, 5-number summary, and IOR of the original data. 


(d) Find the median M, 5-number summary, and IOR of the grouped data. 


A.29. The following distribution gives the number of visits for medical care by 80 patients during a 1-year 
period: 


Number of visits x | 0 1 2 3 #4 6 68 


Number of patients f | 14 «(21 8 15 7 10 5 


(a) Find the sample mean x and standard deviation s. 
(b) Find the median M, 5-number summary, and IOR. 


MISCELLANEOUS PROBLEMS INVOLVING ONE VARIABLE 


A.30. The students at a small school are divided into 4 groups: A, B, C, D. The number n of students in each 
group and the mean score x of each group follow: 


A: n = 80, x = 78; B:n = 60, x = 74; Cn = 85, x = 77; D:n = 75, x = 80 


Find the mean of the school. 


A.31. The mode of a list of numerical data is the value which occurs most often and more than once. Find the 
mode of the data in Problems: (a) A.20, (b) A.21, (c) A.26, (d) A.27. 


BIVARIATE DATA 


A.32. Consider the following list of data values: 


(a) Draw a scatterplot of the data. 


(b) Compute the correlation coefficient r. [Hint: First find >x;, Dy; =x?, Vy?, =x,y; and then use 
Formula (A-//).] 
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A.33. 


A.34, 


A.35. 


A.36. 


A.37. 


A.38. 


(c) For the x and y values, find the means ¥ and y and standard deviations s, and s,. 
(d) Find L, the least-squares line y = a + bx. 
(e) Graph L on the scatterplot in part (a). 


Repeat Problem A.32 for the following list of data values: 


Find the covariance s,, of the variables x and y in: (a) Problem A.32, (b) Problem A.33. 
(See Problem A.14 for the definition of s,,.) 


Suppose 7 people in a company are interviewed, yielding the following data where x is the number of years 
of service and y is the number of people who reviewed the work of the person: 


(a) Draw a scatterplot of the data. 
(b) Find L, the least-squares line y = a + bx. 
(c) Graph L on the scatterplot in part (a). 


(d) Predict the number y of people who reviewed the work of another person if the number of years 
worked by the person is: (i) x = 1, (ii) x = 7, (iii) x = 9. 


Consider the following bivariate data: 


(a) Find the correlation coefficient r. [Hint: First find > x;, = y,, 2x7, =y?, = x;y; and then use Formula 
(A-11).] 

(b) Plot x against y in a scatterplot. 

(c) Find the least-squares line L for the data and graph L on the scatterplot in (b). 


(d) Find the least-squares hyperbolic curve C which has the form y = 1/(a + bx) or 1/y = a + bx and plot 
C on the scatterplot in (b). [Hint: Find the least-squares line for the data points (x;, 1/y,).] 


(e) Which curve, L or C, best fits the data? 


The following table lists average male weight, in pounds, and height, in inches, for certain ages which range 
from 1 to 21: 


Find the correlation coefficient r for: (a) age and weight, (b) age and height, (c) weight and height. 


Let x = age, y = height in Problem A.37. (a) Plot x against yin a scatterplot. (6) Find the line L of best 
fit. (c) Graph L on the scatterplot in part (a). 
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A.39. Let x = weight, y = height in Problem A.37. (a) Plot x against y ina scatterplot. (b) Find the line L of 
best fit. (c) Graph L on the scatterplot in part (a). 


A.40._ Find the least-squares exponential curve y = ab* for the following data: 


Answers to Supplementary Problems 


A.18. (a) See Fig. A-14(a); (b) ¥ = $190.36, M = $190, mid = $210. 


A.19. (a) The frequency distribution (where the wage is divided by $100 for notational convenience) 


follows: 
Amount/100 4-6 6-8 8-10 10-12 12-14 1416 16-18 
Number of loans 11 14 10 4 2 1 3 


(b) The histogram is shown in Fig. A-14(b). 
(c) xX = $842.22, M = $700, mid = $1100. 
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(a) (b) 
Fig. A-14 
A.20. (a) The distributions follow: 


Daily number of wagons 


Frequency 


Cumulative frequency 


(b) #=7.3,M=7, mid = 8. 
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A.21. (a) The frequency and cumulative frequency distributions follow: 


Number of people 
Frequency 


Cumulative frequency 


(b) £=2.8, M=2, mid =4. 
A.22. Group (c) to increase the median; likely (b) and (c) to increase the mean. 
A.23. (a) ¥ = $5.80, s? = 0.021 8, s ~ 0.15; (b) M = $5.75, [5.58, 5.62, 5.75, 5.95, 6.18], IOR = $0.33. 
A.24, (a) ¥ = 35.67, s? = 2.37, s ~ 1.54; (b) M = 36.5, [30, 33, 36.5, 38, 40], IOR = 5. 
A.25. (a) € = 14.3, s? = 9.57, s ~ 3.1; (b) M = 145, [9, 13, 14.5, 17, 18], IOR = 4. 
A.26. (a) = 18.6, s* = 0.939, s ~ 0.97; (b) M = 18.5, [17, 18, 18.5, 19, 21], IQR = 1. 
A27. (a) = 4.92, s* = 12.97h?, s = 3.60h; (b) M = 5, [0, 2, 5, 8, 10], IOR = 6. 
A.28. (a) The distributions with class values follow: 


Scores 50-60 60-70 70-80 80-90 90-100 
Frequency 10 11 9 


Cumulative frequency 31 40 
Class value 95 


Remark: The scores 100 are put in the 90-100 group since there are no scores higher than 100. If 
there were scores higher than 100, then the scores 100 would be put the next higher 100-110 group. 


(b) = 78.8, s? = 222.93, s ~ 14.9. 
(c) M=79, [52, 69, 79, 87, 100], IQR = 18. 
(d) M = 80, [50, 70, 80, 85, 100], IQR = 15. 


A.29. (a) = 2.625, s* = 5.43, s ~ 2.3; (b) M = 2, [0, 1, 2, 4, 8], IOR = 3. 
A.30. © = 77.42. 
A.31. (a) 7; (b) 2: (c) 18; (d) 0. 


A.32. (a) See Fig. A-15(a). 
31.6 


(b) 2x; =17, Dy; = 41, =x? = 71, Dy? = 413, 2x: y; = 171, r= Visa Vi68 = 0.99. 
(c) * =3.5, ¥ = 8.2, st = 3.30, s, = 1.82, Ss = 19.2, s, = 4.38. 
(d) y = 0.13 + 2.38x. 
A.33. (a) Sce Fig. A-15(b). 
(b) x; = 13, Dy; = 13, 2x? = 57, Dy? = 51, Vx,y; = 31, r 2 0.99. 
V14.75 V8.75 


(c) X= 3.25, ¥ = 3.25, s% = 4.92, 5, = 2.22, 55 = 2.92, 5, = 1.71. 
(d) y=5.75 — 0.77x. 
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Po] (5, 11.8) 6 (0,5.75) 


(3.5, 8.2) (3.5, 3.5) 


Ae ee ia 2 2 4 2 6 
(a) (b) 
Fig. A-15 
A.34, (a) Sy, = (15.5 + 0.60 + 0.10 + 0.90 + 14.5)/4 = 7.90. 
(b) Sy = (—3.937 5 — 0.937 5 — 0.1875 — 6.187 5)/3 = —3.75. 
A.35. (a) and (c) See Fig. A-16(a); (b) y = 18.5 — 1.5x; (d) (i) 17, (ii) 8, (iii) 5. 
vA 
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(a) (b) 
Fig. A-16 
A.36. (a) 2x; = 5.7, Dy; = 5.1, 2x7 = 11.45, Vy? = 10.45, Dx 2.53, r ene = —0.644 
- ‘ oe elas re ara _ 4.952 V5.248 


(b) See Fig. A-16(d). 

(c) y= 1.78 — 0.66x and Fig. A-16(b). 
(d) y= 1/1.6x and Fig. A-16(b). 

(e) Cis a better fit. 


www.ebook3000.com 


APPENDIX A] DESCRIPTIVE STATISTICS 281 


A.37. (a) r = 0.98; (b) r = 0.98; (c) r = 0.97. 


A.38. (a) and (c) See Fig. A-17(a); (b) y = 29.22 + 2.13x. 


(100, 56.55) 


(50, 42.55) 


Height 


1 T 1 T > 0 T T T T T T T T 
20, 40 60 80 100 120 140 160 


Weight 
(a) (b) 


Fig. A-17 
A.39. (a) and (c) See Fig. A-17(b); (b) y = 28.55 + 0.28x. 


A.40. y = 3(2°). 


APPENDIX B 


Chi-Square 
Distribution 


B.1 INTRODUCTION 


One fundamental question in probability and statistical analysis is whether or not a pattern of 
observed data fits a given distribution such as a uniform, binomial, or normal distribution or some 
prior distribution. Clearly, the data would not fit the distribution exactly, so we would want to have 
some criteria of “goodness of fit”. The chi-square distribution, denoted by x and defined below, 
gives such criteria. (Here y is the Greek letter chi.) 

The chi-square distribution is also used to decide whether or not certain variables are indepen- 
dent. For example, a pollster might want to know whether or not, say, the sex, ethnic background, 
or salary range of a person is a factor in his or her vote in an election or for some type of 
legislation. 

The formal definition of the x’ distribution follows. 


Definition: Let Z,, Z2,...,Z;, be k independent standard normal distributions. Then 
V=HLZit+ ZB+---+ ZR 
is called the chi-square distribution with k degrees of freedom. 


The number k of degrees of freedom, which can be any positive integer including 1, is frequently 
denoted by “df”. Thus, there is a x’ distribution for each k. Figure B-1 pictures the distribution for 
k =1,4,6,8. The distribution is not symmetric and is skewed to the right. However, for large k the 
distribution is close to the normal distribution. 


9 5 10 15 20 


Fig. B-1. Chi-square distribution for k degrees of freedom. 
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B.2) GOODNESS OF FIT, NULL HYPOTHESIS, CRITICAL VALUES 


Suppose a collection of data, say, given by a frequency distribution with n categories, is obtained 
from a sample size exceeding 30. Moreover, suppose we want to decide by some test whether or not 
the data fit some specific distribution. Let Hy denote the assumption that it does, that is: 


Ho: Hypothesis that the data fits a given distribution. 


Here H, is called the null hypothesis. 


Letting “‘obs” denote observed data and letting “exp” denote expected data (obtained from the 
given distribution), the chi-square value or chi-square statistic for the given data measures the weighted 
squares of the differences, that is, 


- (obs — exp)? 
x > exp 


Assuming that the expected values are not too small (usually, not less than 5), then the above random 
variable has (approximately) the chi-square distribution with 


df=n-1 


degrees of freedom. The formula df =n — 1 comes from the fact that a given frequency distribution 
with categories involves probabilities where n—1 of the probabilities determines the nth 
probability. (See remark below.) 


Clearly, the smaller the x value, the better the fit. However, if x* is too “large”, that is, if y~ 
exceeds some given critical value c, we say that the fit is poor, and we reject Hy. The critical value c 
is determined by preassigning a significance level a where: 


a = probability that x’ exceeds critical value c = P(y’ = c) 


Frequently used choices of a are a = 0.10, a = 0.05, and a = 0.005. 

Table B-1 gives critical values for some commonly used significance levels. The significance level 
a represents the shaded area in the graph appearing in the table. We emphasize that if the y* value 
exceeds the critical value c (falls in the shaded area), then we say that we reject the null hypothesis 
Hy at the a significance level. 


The following remarks are in order: 


Remark 1: The observed data come from a sample from a larger population, so the chi-square 
values form a discrete random variable. This random variable closely approximates the continuous 
x distribution when the sample size exceeds 30. 


Remark 2: The x’ distribution also assumes that each individual expected value is not too small; 
one rule-of-thumb (noted above) is that no expected value is less than 5. 


Remark 3: The formula df = 1 — 1 assumes that the size is the only statistic of the sample that 
isused. If additional statistics of the sample are used, such as the mean ¥X or standard deviation s, then 
the degrees of freedom df will be smaller. (See Examples B.4 and B.6.) 
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Table B-1_ Chi-Square Distribution 
(a = Probability That x° Exceeds Critical Value c) 


a = shaded area 


7 


c = critical value 


0.10 0.05 0.025 0.010 0.005 


2.71 3.84 5.02 6.63 7.88 
4.61 5.99 7.38 9.21 10.60 
6.25 7.81 9.35 11.34 12.84 
7.78 9.49 11.14 13.28 14.86 
9.24 11.07 12.83 15.09 16.75 


10.64 12.59 14.45 16.81 18.55 
12.02 14.07 16.01 18.48 20.28 
13.36 15.51 17.54 20.09 21.96 
14.68 16.92 19.02 21.67 23.59 
16.99 18.31 20.48 23.21 25.19 


17.28 19.68 21.92 24.72 26.76 
18.55 21.03 23.34 26.22 28.30 
19.81 22.36 24.74 27.69 29.82 
21.06 23.68 26.12 29.14 31.32 
22.31 25.00 27.49 30.58 32.80 


23.54 26.30 28.85 32.00 34.27 
24.77 27.59 30.19 33.41 35.72 
25.99 28.87 31.53 34.81 37.16 
27.20 30.14 32.85 36.19 38.58 
28.41 31.41 34.17 37.57 40.00 


34.38 37.65 40.65 44.31 46.93 
40.26 43.77 46.98 50.89 53.67 
51.81 55.76 59.34 63.69 66.77 
63.17 67.51 71.42 76.15 79.49 
118.50 124.30 129.60 135.80 140.20 
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B.3) GOODNESS OF FIT FOR UNIFORM AND PRIOR DISTRIBUTIONS 


This section gives applications of the x’ distribution to goodness-of-fit problems involving a 
uniform distribution and a prior distribution. 


EXAMPLE B.1 Uniform Distribution A company introduces a new product in 4 locations, A, B, C, D. The 
number of items sold during a weekend follow: 


Location | A B C D 


Number of items sold | 80 65 70 85 


Let Hy be the (null) hypothesis that location does not make a difference. Apply the chi-square test at the 
a = 0.10 significance level (90 percent reliability) to accept or reject the null hypothesis Hp. 

The total number of items sold was 300. Assuming the null hypothesis Hy of a uniform distribution, the 
expected sales at each location would be 75. The y’ value for the data follows: 


z (obs — exp)? 
x pS exp 
(80-75)? , (65-75)? , (10-75)? , (85-75)? _ 
a. ee MB SS 


3.33 


There are df = 4 — 1 =3 degrees of freedom. This is derived from the fact that, assuming the number of 
items sold is 300, the sales at 3 of the locations determine the sales at the fourth location. Table B-1 shows that 
the critical 7° value for df = 3 at the a = 0.10 significance level is c = 6.25, which is pictured in Fig. B-2. Since 
3.33 < 6.25, we accept the null hypothesis H, of a uniform distribution, that is, that the evidence indicates that 
location does not make a difference. 


Accept Hy 


EXAMPLE B.2 Prior Distribution The following table lists the percentage of grades of a professor in a certain 
course for previous years and the number of such grades for 100 of a professor’s students for the current year: 


Grade A B C D F 


Previous years 10% 30% 40% 15% 5% 


Current years 15 23 32 22 8 


Consider the following null hypothesis: 
Ho: Current students are typical compared to previous students. 


Use a chi-square test at the a = 0.05 significance level to accept or reject the null hypothesis Hp. 
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There are 100 current student grades, so the number of students also gives the percentage. Thus the y value 
of the data follows: 


_ (obs — exp)? 
x », exp 
_ (5—= 10) , (23-30)? , 32-40% , (22-15 | (8-5) _ 
10 430 40 °° 415  —  § 


10.8 


There are df = 5 — 1 = 4 degrees of freedom. This is derived from the fact that there are 100 students so 
that any 4 of the entries in the distribution table tell us the fifth entry. Table B-1 shows that the critical y* value 
for df = 4 at the a = 0.05 significance level is c = 9.45, which is pictured in Fig. B-3. Since 10.8 > 9.45, we reject 
the null hypothesis H) that the current students are typical of previous students. 


Accept Hy 


Reject Hp 


B.4 GOODNESS OF FIT FOR BINOMIAL DISTRIBUTION 


This section gives applications of the y° distribution to goodness-of-fit problems involving the 
binomial distribution. 


EXAMPLE B.3_ Binomial Distribution with Probability p Given There are 4 special tourist sights A, B, C, D 
inacity. A poll of 600 tourists indicated the following number of sights visited by each tourist: 


Number of sights | 0 1 2 3 4 


Number of tourists | 130 240 170 52 8 


Let H, be the null hypothesis that the distribution is binomial with p = 0.70. Test the hypothesis at the a = 0.10 
significance level. 
The binomial distribution with n = 4 and p = 0.7 follows: 


P(O) = (0.7)* = 0.240, P(2) = 6(0.3)? (0.7)° = 0.265, P(4) = (0.3)* = 0.008 
P(1) = 4(0.3)(0.7)° = 0.412, P(3) = 4(0.3)3 (0.7) = 0.076 
Multiplying the probabilities by the number of 600 tourists gives the following expected data: 


Number of sights | 0 1 2 3 4 


Expected number of tourists | 144 247 159 45 >) 
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The ¥ value of the data follows: 
(obs — exp)? (130-144)? (140 — 247)? 
e= > oo ! 


xp 144 247 
_ (170 - 159)? _ (52-45)? , (8-5) one 
159 45 5 5 


There are df = 5 — 1 = 4 degrees of freedom. ‘This is derived from the fact that the 5 numbers in the table 
are related by the equation that their sum is 600. Thus, any 4 of the numbers determine the fifth number. 

Table B-1 tells us that c = 7.78 is the critical value for df = 4 and a = 0.10, and this relationship is pictured 
in Fig. B-4. Since 5.21 < 7.78, we accept the null hypothesis H, that the distribution is binomial with p = 0.70. 


Accept Hy 


Reject Hy 


c= 7.78 


Fig. B-4 


Remark: Suppose only 200 tourists were polled instead of 600. Although the sample size does satisfy the 
condition that it exceeds 30, the expected number of tourists visiting all 4 sights would only be 2, which is less than 
5. Thus, with a sample of 200, we would not use the chi-square test to test the hypothesis that the distribution 
is binomial with p = 0.7. 


EXAMPLE B.4_ Binimial Distribution Using the Sample to Estimate p A factory makes light bulbs and 
ships them in packets of 4. Suppose 5000 packets are tested and the number of defective bulbs in each packet 
is recorded yielding the following distribution: 


Number of defective bulbs | 0 1 2 3 4 


Number of packets | 1975 2170 740 110 5 


Let Hy be the null hypothesis that the distribution of defective bulbs is binomial. Test the hypothesis at the 
a = 0.05 significance level. 


Here n = 4 but p is not given. Thus, we use the sample proportion p (read: p hat) of defective bulbs as an 
estimate of p. The number d of defective bulbs in all the packets follows: 


d = 0(1975) + 1(2170) + 2(740) + 3(110) + 4(5) = 4000 
The total number b of bulbs is 4(5,000) = 20,000. Thus, we set 


, 4 _ 4000 _), 
ve 20000” 


The binomial distribution with n = 4, p = 0.2 and q = 1 — 0.2 = 0.8 follows: 
P(0) = (0.8)* = 0.409 6, P(2) = 6(0.2)? (0.8)? = 0.153 6, P(4) = (0.2)* = 0.001 6 
P(1) = 4(0.2)(0.8)° = 0.409 6, P(3) = 4(0.2)° (0.8) = 0.025 6, 
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Multiplying the probabilities by 5000, the number of packets, yields the following expected distribution: 


Number of defective bulbs | 0 1 2 3 4 


Expected number of packets | 2048 2048 768 128 8 


The ¥ value of the data follows: 
(obs — exp)? (1975 — 2048)? (2170 — 2048)? 
ca = | 
ex 


p 2048 2048 
_ (740 — 7687 | (110 - 128 | 6-8 _ |, 
"78 CdD 


Finding the number of degrees of freedom in this example is different than in the previous example. Here 
there are two statistics taken from the sample: (a) the size of the sample (5000 packets) and (b) the proportion 
p of defective bulbs (or equivalently, 4000 defective bulbs). Thus, the five entries in the frequency table are 
related by the following two equations: 


Xp +X, + X. +.x3 + x4 = 5000 Oxo + 1x, + 2x5 + 3x3 + 4x4 = 20,000 


where x, denotes the number of packets with k defective bulbs. Accordingly, there are only df =5—2=3 
degrees of freedom; that is, any three of the data entries in the table will yield the remaining two using the two 
equations. 

Table B-1 tells us that c = 7.81 is the critical value for df = 3 and a = 0.05, as pictured in Fig. B-5. Since 
14.5 > 7.81, we reject the null hypothesis H, that the distribution is binomial. 


Accept Hy 


Reject Hy 


c=781 


Fig. B-5 


B.5 GOODNESS OF FIT FOR NORMAL DISTRIBUTION 


This section gives applications of the y° distribution to goodness-of-fit problems involving the 
normal distribution. 


EXAMPLE B.5 Normal Distribution with Given « and o0 Suppose the commuting time T, in minutes, of 300 
students at a college has the following distribution: 


Time | <20 20-30 3040 40-50 >50 


Number of students | 13 75 120 66 26 


Consider the following null hypothesis: 
HA: Distribution is normal with mean p = 35 and standard deviation o = 10. 


Test the null hypothesis at the a = 0.10 significance level. 
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Using the formula z = (T — p)/o, we derive the following z values corresponding to the above T values: 


T value | 20 30 40 50 


z value | —1.5 —0.5 0.5 1.5 


Figure B-6 shows the normal curve with the T values, the corresponding z values, and the probability distribution 
for these values obtained from Table 6-1 (page 184) of the standard normal distribution. 


20 30 40 50 Minutes 


-1.5 —0.5 0.5 1.5 Z values 


Fig. B-6 


Multiplying each probability in Fig. B-6 by 300 gives the following expected numbers of students for the given 
time periods: 


Time | <20 20-30 3040 40-50 >50 


Expected number of students | 20 72.5 115 72.5 20 


The ¥ value of the data follows: 
ss (obs — exp)? (13-20)? (75 —72.5)? | (120 — 115)? 
a — = = 1 | 


p 20 15 115 


_ (66 = 72.5 , (26-20) _ 


5.14 
72.5 20 


There are df = 5 — 1 = 4degrees of freedom. This is derived from the fact that the five numbers in the table 
are related by the equation that their sum is 300. Thus, any four of the numbers determine the fifth number. 

Table B-1 tells us that c = 7.78 is the critical value for df = 4 and a= 0.10. Since 5.14 < 7.78, we accept the 
null hypothesis Ho that the distribution of commuting times are normal with w = 35 and o = 10. 


EXAMPLE B.6 Normal Distribution Using Sample for «and o Suppose the heights h, in inches, of a sample 
of 500 male students at a college have mean x = 66, standard deviation s = 4, and the following distribution: 


Height | <58 58-62 62-66 66-70 70-74 >74 


Number of students | 7 72 162 176 65 18 


Consider the following null hypothesis: 


Hi: Distribution is normal with w = x = 66 and a=s5 = 4. 
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We emphasize that, unlike the previous example, the mean pw and standard deviation o of the normal distribution 
is not given but is estimated from the sample. Test the null hypothesis Hy at the: (a) a = 0.10 significance level, 
(b) a = 0.05 significance level. 

Using the formula z = (h — p)/o, we derive the following z values corresponding to the above h values: 


h value | 58 62 66 70 74 


z value | —2 —1 0 1 2 


Figure B-7 shows the normal curve with the h values, the corresponding z values, and the probability distribution 
for these values obtained from Table 6-1 of the standard normal distribution. 


58 62 66 70 74 Inches 


| 
T 
2 aes)| 0 2 z values 


Fig. B-7 


Multiplying each probability in Fig. B-7 by 500 gives the following expected numbers of students for the given 
height ranges: 


Height | <58 58-62 62-66 66-70 70-74 >74 


Expected number of students | 11 68 171 171 68 11 


The ¥ value of the data follows: 


ie s (ors exp) (7-11) ; (72 — 68)* | (162 — 171)° 


xp 11 68 171 
_ (176-171, (65-68 | (18 = 11)? _ ai 
171 68 11 ; 


Finding the number of degrees of freedom in this example is different than in the previous example. Here 
there are three statistics taken from the sample: the size, the mean, and the standard deviation. Each statistic 
yields an equation relating the six numbers in the frequency table: the sum is 500, the mean is 66, and the standard 
deviation is 4—and the three equations are independent. Thus, any three of the six data entries in the table will 
yield the remaining three data entries using the three equations. Accordingly, in this example, there are only 
df = 6 — 3 = 3 degrees of freedom, not 5 as in the previous example. 


(a) Table B-1 tells us that c = 6.25 is the critical value for df = 3 and a= 0.10. Since 6.89 > 6.25, we reject the 
null hypothesis Ho that the distribution of heights is normal at the a = 10% significance level. 


(b) Table B-1 tells us that c = 7.81 is the critical value for df = 3 and a = 0.05. Since 6.89 < 7.81, we accept the 
null hypothesis H) that the distribution of heights is normal at the a = 5% significance level. 
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This section gives applications of the y* distribution to problems involving the independence of 


various attributes. 


For example, one may want to test whether or not there is a “gender gap” (or “age 


gap”’) in an election, that is, whether the vote for a given candidate or for some piece of legislation does 
or does not depend on the gender (or age) of the voter. 

Since the chi-square test is not accurate for small values, we will assume, as before, that our sample 
exceeds 30 and that no expected frequency is less than 5. 


EXAMPLE B.7 An engineering college has 4 programs: 


(i) electrical, (ii) 


(iii) mechanical, 


(iv) civil 


Suppose 500 students, of which 300 are male and 200 are female, are distributed in the 4 programs as follows: 


Electrical Chemical Mechanical Civil Total 
Male 100 80 70 50 300 
Female 50 50 50 50 200 
Total 150 130 120 100 500 


The 300 and 200 in the last column and the 150, 130, 120, 100 in the last row are called marginal totals, and the 
500 is called the grand total. 

Let Hp be the null hypothesis that the program choice is independent of gender. Test the null hypothesis 
at the: (a) a = 0.10 significance level, (b) a = 0.05 significance level. 

First we want to find the expected eight entries in the table assuming independence. Note 300/500 = 60 
percent of the students are male and 150/500 = 30 percent of the students are studying electrical engineer- 
ing. Thus, the expected number of male students taking electrical engineering is obtained by multiplying the 
product of the probabilities by the total number of students, yielding 


(60% )(30%)(500) = 90 


Equivalently, the expected number can be obtained by multiplying the two marginal totals and dividing by the 
grand total, that is, 


(row total)(column total) — (300)(150) _ 


Expected entry = 90 


grand total 500 
This formula is derived from the fact that 
300 150 (300)(150) 
60% )(30% )(500) = . - 500 = = 90 
( i MOO) 500 500 500 


The other seven expected numbers are obtained similarly. 
Furthermore, rather than forming a new table with the expected values, we add each expected value after the 
corresponding observed value in the above table, say, as follows: 


Electrical Chemical Mechanical Civil Total 
Male 100/90 80/78 70/72 50/60 300 
Female 50/60 50/52 50/48 50/40 200 
Total 150 130 120 100 500 
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Some texts place the expected value below or diagonally down from the observed value. The y’ value of the data 
is easily obtained from the table as follows: 


(obs — exp)? 
v= » ex 


Pp 


_ (100 = 90)? (80-78)? , (70-72)? , (50 — 60)? 


90 


(50-60)? , (50-52)? (50-48) _ (50-40)? _ 


78 


72 


60 


7.21 


60 


52 


48 


40 


There are df = (2 — 1)(4 — 1) = 3 degrees of freedom. This is derived from the fact that the marginal values 


are given, and so: 


(i) Any one value in a column will determine the other value. 


(ii) Any three columns will determine the fourth column. 


Thus, for example, given the first three entries in the first row will give us the other five entries in the table. 


(a) Table B-1 tells us that c = 6.25 is the critical value for df = 3 and a= 0.10. Since 7.21 > 6.25, we reject the 
null hypothesis Hp at the a = 0.10 significance level that the program choice at the college is independent 


of gender. 


(b) Table B-1 tells us that c = 7.81 is the critical value for df = 3 anda = 0.05. Here 7.21<7.81. Thus, at the 
a = 0.05 significance level, we accept the null hypothesis Hp that the program choice at the college is 


independent of gender. 


Remark: The above calculation for the degrees of freedom df is true in general. That is, suppose an 
attribute A has r categories and another attribute B has c categories yielding a table with r rows and c columns. 


Then 


df = (r—1)(c-1) 


gives the number of degrees of freedom. This comes from the fact that: 


(i) Any r—1 entries in a column determine the rth entry in the column. 


(ii) Any c—1 columns determine the cth column. 


EXAMPLE B.8 A town asks its voters whether or not it should build a new park where the vote could be: 


(i) yes, (ii) no, (iii) abstain. 


A poll of 1000 of the voters yields the following data where voters were divided into three age categories, 18-30, 


30-50, 50-70: 
Yes No Abstain Total 
18-30 170 60 20 250 
31-50 255 140 55 450 
51-70 175 100 25 300 
Total 600 300 100 1000 


Let Hy be the null hypothesis that the vote is independent of age. (That is, there is no age gap in the vote.) 
Test the null hypothesis at the a = 0.10 significance level. 
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First we find the expected nine entries in the table where we assume independence. Specifically, we use the 
formula: 


(row total)(column total) 


E ted entry = 
xpected entry grand total 


We add the nine expected values to the above table as follows: 


Yes No Abstain Total 
18-30 170/150 60/75 20/25 250 
31-50 255/270 140/135 55/45 450 
51-70 175/180 100/90 25/30 300 
Total 600 300 100 1000 


The ¥’ value of the data follows: 


ey (obs — exp)? _ (170 — 150)? (60-75), (20 — 25) 
exp 150 a an 


(255 — 270)? , (140 — 135)? (55-45)? , (175 — 180)? 
270 135 45 180 


(100-90)? _ (25-30)? _ 
90 30 


5.32 


The number of degrees of freedom, as noted by the above remark, is obtained by 


df = (r—1)(c-1) = 8-DB-1) =4 


This is derived from the fact that the marginal values are given and so: 


(i) Any two values in a column will determine the third value. 


(ii) Any two columns will determine the third column. 


Thus, for example, given the four entries in the upper left corner of the table, we can obtain the other five 
entries. 

Table B-1 tells us that c = 7.78 is the critical value for df = 4 and a= 0.10. Since 5.32 < 7.78, we accept the 
null hypothesis Hp that the vote for the park is independent of age. 


B.7 CHI-SQUARE TEST FOR HOMOGENEITY 


Two populations are said to be homogeneous with respect to some grouping criteria if they have 
the same percentage distribution. This section gives applications of the x’ distribution to problems 
involving homogeneity, that is, whether different populations are homogeneous. 

The x’ test of homogeneity in this section uses the same type of data table which was used in the 
x test for independence in the last section. We note, however, that the x’ test of independence 
involves a single population whereas the x’ test for homogeneity involves two different populations. 

Again we note that the y’ test is not accurate for small values; hence, as before, we assume that 
our sample exceeds 30 and that no expected frequency is less than 5. 
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EXAMPLE B.9 A sociologist decides to study the distribution of adults (18 years and above) in two cities, New 
York and Boston, where the distribution has three categories, under 30 years, 30-60 years, over 60 years. She 
takes a sample of 150 from New York and a sample of 100 from Boston and obtains the following data: 


<30 30-60 >60 Total 
New York 51 77 22 150 
Boston 29 63 8 100 
Total 80 140 30 250 


Let H, be the null hypothesis that the adult age distribution in New York and Boston is homogeneous. Test the 
null hypothesis at the a = 0.05 significance level. 
The following is the main idea behind our testing procedure: 


If the cities are homogeneous, then the data from the combined population will give a better estimate of 


the age percentages than the data from either individual city. 


Thus, we estimate 


total number under 30 in two cities column total 80 
Percentage under 30 ~ = = = 32% 
total number sampled grand total 250 


Accordingly, with a sample of 150 from New York and 100 from Boston, we would expect the following number 
of adults under 30 in each sample: 


New York: 32%(150) = 48 Boston: 32%(100) = 32 


Observe that we are multiplying each row total by the corresponding percentage (column total/grand 
total). Accordingly, we can again obtain these results using the following formula: 


(row total)(column total) 


E ted entry = 
xpected entry grand total 


Thus, we could have proceeded as follows: 


(150)(80) _ _ (00)(80) _ 


New York: 48 Boston: 32 
250 250 
The other four expected entries are obtained similarly. 
We add the six expected values to our original table as follows: 

<30 30-60 8 >60 Total 
New York 51/48 77/184 22/18 150 
Boston 29/32 63/56 8/12 100 
Total 80 140 30 250 
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The ¥ value of the data follows: 


ae s (obs — exp) (51 — 48)’ | (77 — 84)" | (22 — 18) 
exp 48 84 18 


_ 29-32" , (63-56) , (8-12) 
32 56 eGo) 


9 49 16,9 49. 16 
48 84°18 32 56. 12 


4.15 


There are df = (2 — 1)(3 — 1) = 2 degrees of freedom. This is derived from the fact that the marginal values 
are given, and so: 


(i) Any one value in a column will determine the other value. 


(ii) Any two columns will determine the third column. 


Thus, for example, given the first two entries in the first row, we can obtain the other four entries in the table. 

Table B-1 tells us that c = 4.61 is the critical value for df = 2 anda = 0.10. Since 4.15 < 4.62, we accept the 
null hypothesis Hp at the a = 0.10 significance level, that is, that the age distribution of adults in New York and 
Boston is homogeneous. 


Remark: The above calculation for the degrees of freedom df is true in general, that is, if the data table has 
rrows and c columns. Then the following formula gives the number of degrees of freedom: 


df = (r—1)(c - 1) 


As noted above, this formula comes from the fact that any r — 1 entries in a column determine the rth entry in 
the column, and any c — 1 columns determine the cth column. 


Solved Problems 


GOODNESS OF FIT 


B.1. A die is tossed 60 times yielding the following distribution: 
Face value | 1 2 3 4 5 6 


Frequency | 7 11 10 = © 14 6 12 
Let Ho be the null hypothesis that the die is fair. 


(a) Find the x’ value. (b) Find the degrees of freedom df. 
(c) Test Hy at the a = 0.10 significance level. 


(a) The die was tossed 60 times and there are 6 possible face values. Therefore, assuming the die is fair, 
the expected number of times each face occurs is 10. Thus, the x’ value for the data follows: 
Ga y (obs — exp)’ _(7—10)’ , (11-10)? , (10 — 10) 
exp 10° AO ~ 0 


_ (14-10) | (6-10 | 2-10) 
10 "10 10 
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(b) There are df = 6 —1=5 degrees of freedom. This is derived from the fact that the die was tossed 
60 times, so the number of times 5 of the faces occur determines the number of times the sixth face 
occurs. 

(c) Table B-1 shows that the critical x? value for df=5 at the a=0.10 significance level is 
c= 9.24. Since 4.6 < 9.24, we accept the null hypothesis Ho that the die is fair. 


Suppose the following table gives the percentage of the number of persons per household in the 
United States for a given year. 


Number of persons | 1 2 3 4 5 or more 


Percentage of households | 20 30 18 15 17 
Suppose a survey of 1000 households in Philadelphia for the year yielded the following data: 


Number of persons | 1 2 3 4 5 or more 


Number of households | 270 210 200 100 220 


Let Hy be the (null) hypothesis that the distribution of people in households in Philadelphia is 

the same as the national distribution. 

(a) Find the x” value. (c) Test Hy at the a = 0.10 significance level. 

(b) Find the degrees of freedom df. (d) Test Hy at the a = 0.05 significance level. 

(a) Since there are 1000 households, we divide each data value by 1000 to obtain the following 
percentages for Philadelphia: 


Number of persons | 1 2 3 4 5 or more 


Percentage of households | 27 21 20 = 10 22 


Thus, the x’ value for the data follows: 
pe ss (obs — exp)? _ (27 — 20)° (21 — 30)? | (20 — 18)? (10 — 15)? | (22 — 17) 
exp 20 30 18 15 17 
49 81 4 25 25 
20° 30° 18° 15 «17 


8.50 


(b) There are df = 5 —1=4 degrees of freedom. This is derived from the fact that four of the five 
percentages determines the fifth percentage. 

(c) Table B-1 shows that the critical x’ value for df=4 at the a= 0.10 significance level is 
c= 7.78. Since 8.50 > 7.78, we reject (at the a = 0.10 significance level) the null hypothesis Ho that 
the Philadelphia distribution is similar to the national distribution. 

(d) Table B-1 shows that the critical x* value for df=4 at the a= 0.05 significance level is 
c= 9.49. Since 8.50 < 9.49, we accept (at the a = 0.05 significance level) the null hypothesis Hy that 
the Philadelphia distribution is similar to the national distribution. 


A poll is taken of 160 families in New York with 4 children yielding the following family sex 
distribution (where B denotes boys and G denotes girls): 


Sex distribution | 4B 3B,1G 2B,2G 1B,3G 4G 


Frequency | 9 46 54 38 13 
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B.4. 


Let H be the null hypothesis that the New York distribution is binomial with p = 1/2. 


(a) Find the expected distribution. (c) Find the degrees of freedom df. 
(b) Find the x’ value. (d) Test Hy at the a = 0.10 significance level. 


(a) The binomial distribution with n = 4 and p = 0.5 follows: 


x | 0 1 2 3 4 


P(x) | 1/16 4/16 6/16 4/16 1/16 


Multiplying the probabilities by 160, the number of families, gives the following expected 
distribution. 


Sex distribution | 4B 3B,1G 2B,2G 1B,3G 4G 


Expected frequency | 10 40 60 40 10 


(b) Thus, the ¥’ value for the data follows: 


25 (obs — exp)? _ (9-10)? , (46 — 40)? (54-60)? _ (8-40)? _ (13 — 10)? 
exp 1g = A SS GO re EE Airs or oO 


A BO SO oO 3 
10 40 60 40°10 ~~ 


(c) There are df = 5 —1=4 degrees of freedom. This is derived from the fact that the sum of the five 
numbers in the table is 160, so any four of the numbers determine the fifth number. 


(d) Table B-1 shows that the critical x? value for df=4 at the a= 0.10 significance level is 
c=7.78. Since 2.9 < 7.78, we accept the null hypothesis Ho that the distribution is binomial with 
p = 1/2. 


A resort has 200 cabins which can sleep up to 4 people. Suppose the following table gives the 
overnight occupancy of the cabins for some night. 


Number of people in room | 0 1 2 3 4 


Number of rooms | 7 34 #55 80 =24 


Let Hy be the null hypothesis that the occupancy distribution is binomial. 
(a) Find the expected distribution. (c) Find the degrees of freedom df. 
(b) Find the x’ value. (d) Test Hy at the a = 0.10 significance level. 


Here n = 4 but pis not given. Thus, we use the sample proportion p of the number of occupied beds 
as an estimate of p. The number s of people in all the rooms follows: 


s = 0(7) + 1(34) + 2(55) + 3(80) + 4(24) = 480 
The total number b of beds is 4(200) = 800. Thus, we set 


s 480 
~ = = 06 
b 800 
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(a) The binomial distribution with n = 4, p = 0.6 and q = 1 — 0.6 = 0.4 follows: 
PO) = (0.4)* = 0.025 6, P(3) = 4(0.6)? (0.4) = 0.345 6 
P(1) = 4(0.6)(0.4)? = 0.153 6, P(4) = (0.6)* = 0.129 6 
P(2) = 6(0.6)? (0.4)? = 0.345 6, 


Multiplying the probabilities by 200, the number of rooms, yields the following expected distribution: 


Number of people in room | 0 1 2 3 4 


Expected number of rooms | 5 31 69 69 26 


(b) Thus, the ¥ value for the data follows: 


ey (obs — exp)? _ (7-5)? , (34-31)? , (55-69)? _ (80-69)? , (24 — 26)? 
exp 5 3. £6 69 26 


4.9 196, 121, 4 


5°31 69 69 26 


5.84 


(c) There are df = 5 —1=4 degrees of freedom. This is derived from the fact that the sum of the five 
numbers in the table is 200, so any four of the numbers determine the fifth number. 


(d) Table B-1 shows that the critical x’ value for df=4 at the a= 0.10 significance level is 
c = 7.78. Since 5.84 < 7.78, we accept the null hypothesis Ho that the distribution is binomial. 


GOODNESS OF FIT FOR NORMAL DISTRIBUTION 


B.5. Suppose the weights W, in pounds, of male students of 500 students at a college have the 
following distribution: 


Weight | <120 120-140 140-160 160-180 180-200 >200 


Number of students | 37 91 128 150 75 19 


Let Ho be the null hypothesis that the distribution is normal with mean pw = 160 and standard 
deviation o = 25. 


(a) Find the expected distribution. (c) Find the degrees of freedom df. 
(b) Find the x value. (d) Test Ho at the following significance levels: 
(i) a=010, (ii) a=0.05. 


Using the formula z = (W — p)/o, we derive the following z values corresponding to the above W 
values: 


W value | 120 140 160 180 200 


z value | —1.6 —0.8 0 0.8 1.6 


Figure B-8 shows the normal curve with the W values, the corresponding z values, and the probability 
distribution for these z values obtained from Table 6-1 (page 184) of the standard normal distribution. 
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B.6. 


0549 


120 140 160 180 200 Pounds 


-1.6 -0.8 0 0.8 1.6 z values 


(a) Multiplying the probabilities in Fig. B-8 by 500 gives the following expected numbers of students for 
the given weight intervals: 


Weight | <120 120-140 140-160 160-180 180-200 >200 


Number of students | 28 78 144 144 78 28 


(b) The ¥ value for the data follows: 


oy lobe exp? _ G7 = 28) , (91 = 78)? (128 ~ 144)? 


xp 28 78 144 
_ (150 = 144)? (75 — 78)? , (19 - 28)? 
144 78 28 


81 169 196 36, 9 , 81 


28° 78 144 144 78 28 


9.67 


(c) There are df = 6—1=5 degrees of freedom. This is derived from the fact that the sum of the six 
numbers in the table is 500, so any five of the numbers determines the sixth number. 

(d) The row df = 4 in Table B-1 shows that the critical x° value is c = 9.24 for a = 0.10 and c = 11.07 for 
a= 0.05. Since x* = 9.67, we: (i) reject Hy for a = 0.10, (ii) accept Hy for a = 0.05. 


Suppose the average hourly daily workload x of 600 American employees yielded the following 
data: 


Hourly workload x | <5 5-6 6-7 7-8 8-9 9-10 >10 


Number of employees | 8 45 150 210 130 40 17 


Suppose x = 7.5 is the sample mean and s = 1.2 is the sample standard deviation. Let Hy be 
the null hypothesis that the distribution is normal with the estimation that the mean p = * = 7.5 
and the standard deviation 0 = s = 1.2. 
(a) Find the expected distribution. (c) Find the degrees of freedom df. 
(b) Find the x value. (d) Test Hy at the following significance levels: 

(ij) a=0.10, (ii) a=0.05. 
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Using the formula z = (x — 1)/a, we derive the following z values corresponding to the above x 
values: 


x value | 5 6 7 8 9 10 


z value | 2.08 1.25 0.42 0.42 1.25 2.08 


Figure B-9 shows the normal curve with the x values, the corresponding z values, and the probability 
distribution for these z values obtained from Table 6-1 of the standard normal distribution. 


5 6 7 & 9 10 


x value 


—2.08 =1,.25 —().42 0.42 1.25 2.08 z value 


Fig. B-9 


(a) Multiplying the probabilities in Fig. B-9 by 600 gives the following expected numbers of employees 
for the given time intervals: 


Hourly workload | <5 5-6 6-7 7-8 8-9 9-10 >10 


Number of employees | 11 52 139 196 139 52 11 


(b) The ¥ value for the data follows: 
oe > (obs—expyY (8-11? (45-32), (150 — 139)° 
exp it °° ~ 33 139 
_ (210 = 196)? _ (130-139)? , (40-52)? | 7-11 
: 196 i a; one 1 


9 49 121 196 81 49 36 
t t t t t t 8.43 


(c) The seven numbers in our table are related by three equations, one determined by the size n = 600, 
one by the sample mean x = 7.5, and one by the sample standard deviation s = 1.2. Thus, there are 
only df = 7 — 3 = 4 degrees of freedom. 


(d) The row df = 4 in Table B-1 shows that the critical ¥ value is c = 7.78 for a = 0.10 and c = 9.49 for 
a=0.05. Since x* = 8.43, we: (i) reject Hy for a = 0.10, (ii) accept Hy for a = 0.05. 
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INDEPENDENCE 


B.8. Voters in a certain town can only register as Democratic, Republican, or Independent. 


CHI-SQUARE DISTRIBUTION 


of 800 registered voters yields the following gender distribution: 
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A poll 


Democratic Republican Independent Total 
Male 140 192 68 400 
Female 160 158 82 400 
Total 300 350 150 800 


Let Hy be the hypothesis that the party affiliation is independent of gender. 


(a) Find the x value. (b) Find the degrees of freedom df. 
(c) Test Ho at the following significance levels: (i) a = 0.10, (ii) a = 0.05. 


The expected six entries in the table, assuming independence, are obtained by the formula: 


(row total)(column total) 


E ted entry = 
xpected entry grand total 


For example, the expected number of males to register as Democrats, Republicans, and Independents, 


respectively, follow: 


(400)(300) _ (400)(350) _ 175 (400)(150) _ 95 
800 800 ; 800 
Adding the six expected values to the above table yields the following: 
Democratic Republican Independent Total 
Male 140/150 192/175 68/75 400 
Female 160/150 158/175 82/75 400 
Total 300 350 150 800 


(a) Using the above table for the (obs — exp)’ values, we obtain the y’ value as follows: 


(obs — exp)? 100 289 49 100 289 49 
= t t t t t 5.94 
md, exp 150° 175 96. 150. 195. 


(b) The number of degrees of freedom is obtained by 


df = (r— 1)(c- 1) = (2-1)3-1) =2 


This is derived from the fact that the marginal values are given; hence: 
(i) Any value in a column will determine the other value. 


(ii) Any two values in a row will determine the third value. 


Thus, for example, given the first two values in the first row, we can determine the remaining four 
values. 


(c) The row df = 2 in Table B-1 shows that the critical y’ value is c = 4.61 for a = 0.10 and c = 5.99 for 
a= 0.05. Since x* = 5.94, we: (i) reject Hy for a = 0.10, (ii) accept Hy for a = 0.05. 


302 CHI-SQUARE DISTRIBUTION [APPENDIX B 


B.10. A grocery chain of stores carries four brands A, B, C, D of a certain type of cereal. The chain 
recorded the brand of the cereal sold and the age of the buyer, where the buyers were divided 
into three age categories: younger than 20, 20-40, older than 40. The frequency distribution 
during 1 week follows: 


A B Cc D Total 
<20 90 64 78 48 280 
20-40 88 78 70 64 300 
>40 62 58 52 48 220 
Total 240 200 200 160 800 


Let Ho be the hypothesis that the cereal choice is independent of age. (a) Find the y 
value. (b) Find the degrees of freedom df. (c) Test Hy at the a = 0.10 significance level. 


The expected 12 entries in the table, assuming independence, are obtained by the formula: 


(row total)(column total) 


Expected entry = 
grand total 


Adding the 12 expected values to the above table yields the following: 


A B C D Total 
<20 90/84 64/70 78/70 48/56 280 
20-40 88/90 78/75 70/75 64/60 300 
>40 62/66 58/55 52/55 48/44 220 
Total 240 200 200 160 800 


(a) The ¥ value for the data follows: 


exp 84 70 70 56 90 75 75 60 
16 9 4 16 


‘6 8 5 a 


(obs— exp? 36 36 64 64 4 9 25 16 
=> i i 1 


(b) The number of degrees of freedom is obtained by 


df = (r—1)(c- 1) = @-1)(4-1) =6 


This is derived from the fact that the marginal values are given; hence any two values in a column 
will determine the third value, and any three values in a row will determine the fourth value. Thus, 
for example, the first three values in the first two rows, determines the remaining six values. 


(c) Table B-1 shows that the critical x* value for df = 6 and a = 0.10 is c = 10.64. Since 4.61 < 10.64, 
we accept the null hypothesis H, that the choice of cereal is independent of age. 
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HOMOGENEITY 


B.11. Suppose an opinion poll on a referendum in 4 city districts yields the following data: 


Yes No Undecided | Total 
District 3 18 20 12 50 
District 8 26 16 8 50 
District 11 20 24 6 50 
District 16 28 12 10 50 
Total 92 72 36 200 


Let Hy by the hypothesis that the voter opinion on the referendum is homogeneous in the 4 
districts. (a) Find the y* value. (b) Find the degrees of freedom df. (c) Test Hy at the 
a = 0.10 significance level. 


First we find the 12 expected entries in the table. Assuming the combined population of 200 voters 
will give a better estimate of voter opinion than either individual district, we find the expected entries using 
the following formula: 

__ (row total)(column total) 


E ted ent 
xpected entry grand total 


We add the 12 expected values to the above table as follows: 


Yes No Undecided | Total 
District 3 18/23 =20/18 12/9 50 
District 8 26/23 16/18 8/9 50 
District 11 20/23 24/18 6/9 50 
District 16 28/23 12/18 10/9 50 
Total 92 72 36 200 


(a) The ¥ value for the data follows: 


= (obsexpy 25, 0 De ow ed oD 2 AG 
x 2» exp : 


9 DS ted 
9 23° 18° 9 


(b) The number of degrees of freedom is obtained by 


df = (r— 1)(c- 1) = (4-1)G-1) =6 


This is derived from the fact that the marginal values are given; hence any three values in a column 
will determine the fourth value, and any two values in a row will determine the third value. 


Thus, for example, the first two values in the first three rows determines the remaining six values. 
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(c) Table B-1 shows that the critical 7° value for df = 6 and a = 0.10 is c = 10.64. Since 7.40 < 10.64, 
we accept the null hypothesis H, that the voter opinion on the referendum is homogeneous in the 
4 districts. 


B.12. Suppose the following table gives the distribution of geometry grades in 2 high schools. 


A B C D F Total 
High school 1 27 39 60 42 32 200 
High school 2 28 51 120 68 33 300 
Total 55 90 180 110 65 500 


Let Ho be the hypothesis that the grade distributions are homogeneous in the 2 high 
schools. (a) Find the yx value. (b) Find the degrees of freedom df. (c) Test Ho at the 
following significance levels: (i) a = 0.10, (ii) a = 0.05. 

First we find the 10 expected entries in the table. Assuming the combined population of 500 students 


will give a better estimate of the grade distribution than either individual school, we find the expected 
entries using the following formula: 


(row total)(column total) 


E ted entry = 
xpected entry grand total 


We add the 10 expected values to the above table as follows: 


A B Cc D F Total 
High school 1 27/22 39/36 60/72 42/44 32/26 200 
High school 2 28/33, 51/54 120/108 68/66 33/39 300 
Total 55 90 180 110 65 500 


(a) The ¥ value for the data follows: 


¥ yy Geen 2,9 +144, 4 ~ 36,25, «9 ~« «144 4 (36 
exp 22°36 72 «44 «26 «33 «54 °=«2108 «6639 


(b) The number of degrees of freedom is obtained by 


df = (r—1)(e— 1) = (2-1)(5-1) =4 


This is derived from the fact that the marginal values are given; hence: 


(i) Any value in a column will determine the other value. 


(ii) Any four values in a row will determine the fifth value. 


Thus, for example, the first four values in the first row determines the remaining six values. 


(c) The row df = 4 in Table B-1 shows that the critical y° value is c = 7.78 for a = 0.10 and c = 9.49 for 
a= 0.05. Since y* = 8.10, we: (i) reject Hp for a = 0.10, (ii) accept Hy for a = 0.05. 
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Supplementary Problems 


GOODNESS OF FIT 


B.13. 


B.14. 


B.15. 


B.16. 


B.17. 


B.18. 


A coin is tossed 80 times yielding 48 heads and 32 tails. Let Hy be the hypothesis that the coin is fair. 


(a) Find the x’ value and the degrees of freedom df. 
(b) Test Ho at the following significance levels: (i) a = 0.10, (ii) a = 0.05. 


Suppose the frequency of each digit in the first 100 digits in a random number yields the following 
distribution: 


Digit | 0 1 2: 3 4 5 6 7 8 9 


Frequency | 7 11 «(12 9 6 13. 11 ~=«10 9 12 


Let Hy be the hypothesis that each digit occurs with the same probability. (a) Find the x’ value and the 
degrees of freedom df. (b) Test Ho at the a = 0.05 significance level. 


The following table lists the grading policy of a department for a sophomore mathematics course and the 
number of such grades by a professor for 120 of her students. 


Grade | A B C D F 
Policy 10% 40% 35% 10% 5% 
Course 18 55 34 7 6 


Let Hy be the hypothesis that the professor conforms to department policy. (a) Find the expected 
distribution. (b) Find the x value and the degrees of freedom df. (c) Test Hy at the following 
significance levels: (i) a = 0.10, (ii) a = 0.05. 


It is estimated that 60 percent of cola drinkers prefer Coke over Pepsi. In a random poll of 600 cola 
drinkers, 330 preferred Coke over Pepsi. (a) Find the expected distribution and the y° value. (b) Test 
the hypothesis H that the estimate is correct at the a = 0.10 significance level. 
It is estimated that the political preferences in a certain town are as follows: 
35% Democrat, 40% Republican, 15% Independent, 10% other 

A random sample of 200 people resulted in the following preferences: 

64 Democrat, 76 Republican, 38 Independent, 22 other 
Let Ho be the hypothesis that the estimate is correct 


(a) Find the x’ value and the degrees of freedom df. 
(b) Test Ho at the a = 0.10 significance level. 


The following table gives the age percentages of people living in the United States for some given year 
(using four age categories: under 20, 20-39, 40-64, 65 and over), and the age distribution of a sample of 
500 people living in Florida: 


Age <20 20-39 40-64 =65 


United States 28% 24% 32% 16% 
Florida 115 130 155 100 
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Let Hp be the hypothesis that the age distribution in Florida is the same as the national distribution. (a) 
Find the expected distribution. (b) Find the x’ value and the degrees of freedom df. (c) Test Hy at the 
a = 0.10 significance level. 


BINOMIAL DISTRIBUTION 


B.19. 


B.20. 


Applicants for a civil service position take a national test with three separate parts. The following table 
gives the number of parts passed by each of 500 applicants: 


Number of parts passed | 0 1 2 3 


Number of applicants | 180 =. 200 100 20 


Let Hy be the hypothesis that the distribution is binomial with p = 0.3. (a) Find the expected distribution. 
(b) Find the x* value and the degrees of freedom df. (c) Test Hy at the a = 0.10 significance level. 


A study is made of the number of children in a 4-child family who have attended college. Interviews with 
600 families produced the following data: 


Number of children | 0 1 2 3 4 


Number of families | 75 170 150 90 15 


Let Ho be the hypothesis that the distribution is binomial. (a) Find the population proportion p of 
children attending college. (b) Find the expected distribution using p = p. (c) Find the x’ value and the 
degrees of freedom df. (d) Test Hp at the a = 0.10 significance level. 


NORMAL DISTRIBUTION 


B.21. 


B.22. 


Suppose the following table gives the average daily minutes of time T spent watching television by a 
sample of 400 10-year-old children. 


Time | <600 600-1000 1000-1400 1400-1800 >1800 


Number of children | 20 82 160 100 38 


Let Ho be the hypothesis that the distribution is normal with mean yw = 1200 and standard deviation 
a = 400. (a) Find the z values corresponding to T = 600, 1000, 1400, 1800. (b) Find the expected 
distribution. (c) Find the x* value and the degrees of freedom df. (d) Test Hy at the following 
significance levels: (i) a = 0.10, (ii) a = 0.05. 


Suppose the following gives the number x of eggs produced annually by 200 chickens at a farm. 


Eggs | <280 280-300 300-320 320-340 340-360 >360 


Number of chickens | 10 33 70 55 20 12 


Furthermore, suppose * = 315 is the sample mean and s = 25 is the sample standard deviation. Let Ho 
be the hypothesis that the distribution is normal with the estimation that the mean p = x = 315 and the 
standard deviation o=s = 25. (a) Find the z values corresponding to x = 280, 300, 320, 340, 360. 
(b) Find the (approximate) expected distribution. (c) Find the ¥ value and the degrees of freedom df. 
(d) Test Hy at the a = 0.10 significance level. 
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INDEPENDENCE 


B.23. Voters in a certain town can only register as Democratic, Republican, or Independent. A poll of 500 
registered voters yields the following gender distribution: 


Democratic Republican Independent Total 
Male 95 125 40 260 
Female 105 100 35 240 
Total 200 225 75 500 


Let Hy be the hypothesis that the party affiliation is independent of gender. (a) Find the expected 
distribution. (b) Find the x” value and the degrees of freedom df. (c) Test Hy at the a = 0.10 significance 
level. 


B.24. Suppose a large university wants to determine the student opinion (favor or oppose) on the requirement 
of a certain dress code for attending classes. A poll of 100 students per class is taken yielding the 
following table: 


Freshman Sophomore Junior Senior Total 
Favor 35 42 45 58 180 
Oppose 65 58 55 42 220 
Total 100 100 100 100 400 


Let Hy be the hypothesis that the opinion is independent of the class of the student. (a) Find the x” value 
and the degrees of freedom df. (b) Test Hy at the a = 0.10 significance level. 


HOMOGENEITY 


B.25. A study is made of political party affiliation of voters in three regions of the country, northeast, south, and 
west. Interviews with 500 voters yielded the following distribution: 


Democrat Republican Other Total 
Northeast 105 79 16 200 
South 60 82 8 150 
West 65 79 6 150 
Total 230 240 30 500 


Let Ho be the hypothesis that the distribution is homogeneous. 
(a) Find the expected distribution. (b) Find the y’ value and the degrees of freedom df. 
(c) Test Hp at the following significance levels: (i) a = 0.10, (ii) a = 0.05, (iii) a = 0.025. 
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B.26. 


B.13. 


B.14. 


B.15. 


B.16. 


B.17. 


B.18. 


B.19. 


B.20. 


B.21. 


B.22. 


B.23. 


B.24. 


B.25. 


B.26. 
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A study is made of the grades of full-time and part-time students for a freshman mathematics course at a 
university yielding the following data: 


A B Cc D F Total 


Full time 32. 58 70 50 30 240 
Part time 18 32 50 30 30 160 
Total 50 90 120 80 60 400 


Let Hy be the hypothesis that the distribution is homogeneous. (a) Find the expected distribution. 
(b) Find the x’ value and the degrees of freedom df. (c) Test Hy at the a = 0.10 significance level. 


Answers to Supplementary Problems 
(a) x° = 64/45 + 64/35 = 3.25, df = 1; (b) (i) no, (ii) yes. 


(a) x° = 4.6, df = 9; (b) yes. 


(a) [12, 48, 42, 12, 6]; (b) x2 = 36/12 + 49/48 + 64/42 + 25/12 + 0/6 = 7.62, df = 4; (c) (i) no, (ii) yes. 


(a) [360, 240], x? = 900/360 + 900/240 = 6.25; (b) no. 


(a) 2 = 36/70 + 16/80 + 64/30 + 4/20 = 3.05, df = 3; (b) yes. 

(a) [140, 120, 160, 80]; (b) x? = 10.45, df = 3; (c) no. 

(a) [171.5, 220.5, 94.5, 13.5]; (b) x* = 72.25/171.5 + 420.25/220.5 + 30.25/94.5 + 36.25/13.5 = 5.33, df = 3; 
(c) yes. 


(a) p = 0.4; (b) [64.8, 172.8, 172.8, 76.8, 12.8]; 


104.04 7.84 519.84 174.24 4.84 
t t t t .26, df = 4; (d 7 
OX =e + tet ts) 768 * 1s 7 7% ahd) yes 


(a) [-1.5, —0.5, 0.5, 1.5]; (b) [27, 96, 153, 96, 27]; 
(c) ¥ = 49/27 + 196/96 + 49/153 + 9/96 + 121/27 = 8.75, df = 4; (d) (i) no, (ii) yes. 


(a) [—1.4, —0.6, 0.2, 1.0, 1.8]; (b) [16, 39, 61, 52, 25, 7]; 
(c) x = 36/16 + 36/39 + 81/61 + 9/52 + 25/25 + 25/7 = 9.25, df = 3; (d) no. 


(a) [104, 117, 39; 96, 108, 36]; (b) x° = 81/104 + 64/117 + 1/39 + 81/96 + 64/108 + 1/36 = 2.72, df = 2; 
(c) yes. 


(a) x° = 11.23, df = 3; (b) no. 


(a) [92, 96, 12; 69, 72, 9; 69, 72, 9]; 


169 289 16 81 100 1 16 49 9 
(b) ¥ a Go i eo a eo a 10.76, df = 4; (c) no, no, yes. 


(a) [30, 54, 72, 48, 36; 20, 36, 48, 32, 24]; 
4 1 4 4 36 4 1 4 4. 36 


(b) x 30° 54° 72 48 36 «20=«36 «4832s 


3.92, df = 4; (c) yes. 
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Addition rule, 64 
Additive property, 63 
Algebra of sets, 7, 13 
Approximate curve, 258 
Average value, 248 


B(n, p), binomial distribution, 179 
BP(k), binomial probability, 190 
Bayes’ formula, 90, 103 
Bernoulli trials, 177 
Best-fit line, 258 
Binomial: 
coefficients, 34, 42 
theorem, 35 
Binomial distribution, 177, 179 
normal approximation, 187 
Binomial theorem, 35 
Birthday problem, 69 
Bivariate data, 254 
Boundaries, class, 246 


C(n, r), 39 
c, critical value, 283 
Cards, deck of, 62 
Cartesian plane, 10 
Category, 245 
Cells, 12 
Central limit theorem, 189 
Chain (Markov), 224 
Chebyshev’s inequality, 140, 161, 173 
Chi-square distribution Table, 284 
Chi-square distribution, 282 
Table of values, 284 
Class: 
boundaries, 246 
limits, 246 
value, 245 
width, 246 
Classes of sets, 12, 23 
Coefficient, correlation, 256 
Collection, 1 
Combinations, 39, 47 
Combinatorial analysis, 32 
Complement: 
of a set, 6 
rule, 63 
Conditional probability, 86, 95 
Contained, 1 
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Continuous random variable, 120 
expectation, 137 
independent, 138 
standard deviation, 138 
variance, 138 
Continuous sample space, 61, 67 
Correlation, 130, 255 
coefficient, 256 
Countable sets, 8, 14 
Countably infinite sample space, 61, 68 
Counting, 32, 44 
Covariance, 130, 256, 272 
Critical value c, 283 
Cumulative distribution functions, 139 
Cumulative frequency, 246 
Curve fitting, 260 
Curves, 258 


df, Degrees of freedom, 282 
De Morgan’s laws, 7 
Deck of cards, 62 
Degrees of freedom (df), 282 
Density function, 137 
Dependent events, 92 
Descriptive statistics, 245 
Diagonal of a matrix, 225 
Dice, 61, 62 
Difference of sets, 6 
Discrete: 

probability space, 61 

random variable, 120, 136 
Disjoint sets, 4, 5 
Distribution (of a random variable), 121 
Distribution: 

binomial, 177 

chi-square, 282 

Gaussian, 180 

joint, 129 

marginal, 130 

multinomial, 193 

normal, 180 

Poisson, 191 

probability, 66 

standard normal, 181 
Distribution function, cumulative, 139 
Duality, 8, 14 


E(X), expectation, 124, 136, 137 
EXP(*), exponential distribution, 197 


310 


Element, 1 
Empty set, 3 
Equiprobable space, 65, 72, 122 
Error, square, 257 
Event, 60, 70 
Expectation, 124, 126, 137 
Exponential: 
curve, 260 
distribution, 197 


Factorial (n!), 34, 42 

Fair game, 126 

Finite: 
equiprobable space, 65 
probability space, 65 
sets, 8, 14 

Five-number summary, 253 

Fixed vector, 226 
probability vector, 227 

Frequency: 
distribution, 245 
relative, 59 

Function, 119 
cumulative distribution, 139 
of random variables, 134 
probability, 63 


GEO(p), geometric distribution, 195 
Game, 126 
Gaussian (normal) distribution, 180 
Geometric: 
curve, 260 
distribution, 195 
Goodness of fit, 283 
binomial distribution, 286 
normal distribution, 288 
prior distribution, 285 
uniform distribution, 285 
Grand mean, 250 
Graph, probability, 121 


H, highest value, 253 

HA, null hypothesis, 283 
Histogram, 246 

Homogeneity, test of, 293 
Homogeneous, 293 

Hyperbolic curve, 258 
Hypergeometric distribution, 194 
Hypothesis, null, 283 


IQR, interquartile range, 253 
Image of a function, 119 
Impossible event, 60 
Inclusion-exclusion, 9 


INDEX 


Independent: 
events, 93, 106 
random variables, 133, 138 
repeated trials, 94, 109 
Indexed sets, 13 
Induction, 13 
Integers, 3 
Interquartile range (IQR), 253 
Intersection of sets, 5 
Intervals, 3 


Joint distribution, 129 


L, lowest values, 253 
Large numbers, law of, 141 
Law of large numbers, 141 
Least-squares, 257 

curve, 261 

line, 257 
Line: 

of best fit, 258 

least-squares, 257 


M, median, 248 
Marginal distributions, 130 
Markov process, 228 
Mathematical induction, 13, 25 
Matrix, 224 

stochastic, 226 

transition, 228 
Mean, 124, 126 

arithmetic, 248 

grand, 250 

population, 250 

weighted, 250 
Measures of position, 253 
Median, 248 
Memory property, 198 
Midrange, 249 
Multinomial distribution, 193 
Multiplication theorem, 87, 99 
Mutually exclusive events, 60 


N, natural numbers, 3 

N(p, 0”), normal distribution, 181 
NP(X), normal probability, 190 
Negative correlation, 255 

Normal distribution, 180 


approximation of binomial distribution, 187 


probabilities, evaluating, 182, 186 
standardized, 181 
Table 6-1, 184 

Normal equations, 258 

Null hypothesis, 283 
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Odds, 76 Scatterplot, 254 
One-sided binomial probabilities, 191 Sets, 1 
Ordered pairs, 10, 21 algebra of, 7 
countable, 8 
P(n,r), 37 finite, 8, 14 
Partition of a set, 12, 23 product of, 10 
Pascal’s triangle, 35, 36 Significance level a, 283 
Permutations, 36, 45 Space, 65, 74 
with repetitions, 37 Squares error, 257 
Picture cards, 63 Standard deviation, 127, 136, 138, 250 
Plane R’, 10 Standardized: 
Poisson distribution, 191 normal curve, 184 
Positive correlation, 255 normal distribution, 181 
Power set, 12 random variable, 129 
Probability, 59, 63 State: 
axioms, 63 distribution, 119 
distribution, 66, 121 space of a Markov process, 228 
function, 63 Stationary state distribution, 230 
histogram, 188 Statistics, descriptive, 245 
joint, 129 Step function, 139 
space, finite, 65 Stirling’s approximation, 34 
vectors, 226 Stochastic: 
Product: matrix, 226 
of random variables, 121 process, 87, 100 
of sets, 10, 21 Subset, 1 
probability space, 79 Suits, 62 
rule, 33 Sum: 
Proper subset, 2 of random variables, 121 
rule, 32 


Q,, first quartile, 253 
Q;, third quartile, 253 
Quartiles, 253 


Sure event, 60 
Symmetric difference, 6 


R veal nunibere.3 Total probability, law of, 89, 103 


R’, plane, 10 Transition: 
Random, 65 diagram, 239 
matrix, 228 


Random variable, 119 
normal, 180 
standardized, 129 


Tree diagram, 41, 50, 87 


U, universal set, 3 


Random variables, UNIF(a, b), uniform distribution, 196 
functions of, 134 Uncountable sample spaces, 68 
independent, 133 Uniform: 

Range: distribution, 196 
of a data set, 253 space, 68, 76 


space, 119 
Real line R, 3 
Regression line, 258 


Union of sets, 5 
Universal set, 3 


Regular stochastic matrix, 226 Variable, random, 119 

Repeated trials, 94 Variance, 127, 136, 138, 250 
Vectors, 224 

Sample: probability, 226 


mean, 141, 189 
spaces, 60, 67, 70 
standard deviation, 251 Weighted mean, 250 
variance, 251, 252 Width, class, 246 
Sample covariance, 272 
Sampling, 38 Z, integers, 3 


Venn diagram, 4, 14 


